multivelovae.filter_genes_dispersion
- multivelovae.filter_genes_dispersion(data, flavor='seurat', min_disp=None, max_disp=None, min_mean=None, max_mean=None, n_bins=20, n_top_genes=None, retain_genes=None, log=True, subset=True, copy=False)
Extract highly variable genes. Adapted from scVelo and made compatible with negative means.
Expects non-logarithmized data. The normalized dispersion is obtained by scaling with the mean and standard deviation of the dispersions for genes falling into a given bin for mean expression of genes. This means that for each bin of mean expression, highly variable genes are selected.
- Args:
- data
anndata.AnnData, np.ndarray, sp.sparse The (annotated) data matrix of shape n_obs x n_vars. Rows correspond to cells and columns to genes.
- flavor{‘seurat’, ‘cell_ranger’, ‘svr’}, optional (default: ‘seurat’)
Choose the flavor for computing normalized dispersion. If choosing ‘seurat’, this expects non-logarithmized data - the logarithm of mean and dispersion is taken internally when log is at its default value True. For ‘cell_ranger’, this is usually called for logarithmized data - in this case you should set log to False. In their default workflows, Seurat passes the cutoffs whereas Cell Ranger passes n_top_genes.
- min_mean=0.0125, max_mean=3, min_disp=0.5, max_disp=`None`float, optional
If n_top_genes unequals None, these cutoffs for the means and the normalized dispersions are ignored.
- n_binsint (default: 20)
Number of bins for binning the mean gene expression. Normalization is done with respect to each bin. If just a single gene falls into a bin, the normalized dispersion is artificially set to 1. You’ll be informed about this if you set settings.verbosity = 4.
- n_top_genesint or None (default: None)
Number of highly-variable genes to keep.
- retain_genes: list, optional (default: None)
List of gene names to be retained independent of thresholds.
- logbool, optional (default: True)
Use the logarithm of the mean to variance ratio.
- subsetbool, optional (default: True)
Keep highly-variable genes only (if True) else write a bool array for highly-variable genes while keeping all genes.
- copybool, optional (default: False)
If an
AnnDatais passed, determines whether a copy is returned.
- data
- Returns:
- None:
if not copy. Directly modifies adata.
AnnData:if copy. Returns a new AnnData object.