API References
- STAVAG.DVG_detection(adata, coords, sps=False, threshold=0.05, num_perm=1)[source]
Detect Directionally Variable Genes (DVGs) using regression on spatial coordinates.
- Parameters:
adata (AnnData) – AnnData with expression matrix
adata.Xand gene namesadata.var.index.coords (ndarray) – Spatial coordinates of cells with shape
(n_cells, n_dim). For example two columns for x and y or three columns for x y z.sps (bool, optional) – If True, compute STAVAG priority scores by comparing observed importances with random baselines. Defaults to False.
threshold (float, optional) – Importance threshold used when selecting DVGs. Larger values keep more genes. Defaults to 0.05.
num_perm (int, optional) – Number of permutations used to build an empirical null distribution of feature importances. - If 1: keep the original single-permutation behavior. - If > 1: must be >= 100; empirical p-values are computed for each gene.
- Returns:
Dictionary containing top important genes per coordinate axis (e.g., ‘x’, ‘y’, ‘z’), filtered with the threshold.
For num_perm == 1: same structure as before (with SPS scores if sps=True).
For num_perm > 1: each DataFrame contains columns [‘Feature’, ‘Importance’, ‘null_mean’, ‘pval’] and is filtered by p-value (<= threshold).
- Return type:
Dict[str, DataFrame]
- STAVAG.TVG_detection(adata, coords, sps=False, threshold=0.05, num_perm=1)[source]
Detect Temporally Variable Genes (TVGs) using regression on a 1D time coordinate.
- Parameters:
adata (AnnData) – An AnnData object containing gene expression matrix
adata.Xand gene namesadata.var.index.coords (ndarray) – 1D temporal coordinate of cells with shape
(n_cells, 1).sps (bool, optional) – If True and num_perm == 1, compute STAVAG priority scores by comparing observed importances with a single random baseline (original behavior). Defaults to False.
threshold (float, optional) –
If num_perm == 1: cutoff used by
keep_variant_genesto select TVGs based on importance.If num_perm > 1: p-value cutoff; genes with pval <= threshold are kept.
Defaults to 0.05.
num_perm (int, optional) –
Number of permutations used to build an empirical null distribution of feature importances. - If 1: keep the original single-permutation behavior. - If > 1: must be >= 100; empirical permutation p-values are computed
for each gene.
- Returns:
Dictionary containing important genes over the time axis
'T'.For num_perm == 1: same structure as before (with SPS scores if sps=True), already filtered by the given threshold.
For num_perm > 1: the DataFrame under key ‘T’ contains columns: [‘Feature’, ‘Importance’, ‘null_mean’, ‘pval’] and is filtered by p-value (<= threshold).
- Return type:
Dict[str, DataFrame]
- STAVAG.calculate_sps(coord_dict_raw, coord_dict_rand, n_dim, keys=None)[source]
Compute STAVAG priority scores (sps) for each axis.
The score is the right tail proportion of random importances that are greater than or equal to the observed importance.
- Parameters:
coord_dict_raw (Dict[str, DataFrame]) – Dict of DataFrames per axis. Each DataFrame must contain columns ‘Feature’ and ‘Importance’.
coord_dict_rand (Dict[str, ndarray]) – Dict of random importance arrays per axis.
n_dim (int) – Number of coordinate dimensions.
keys (Sequence[str] | None) – Optional explicit axis names to use.
- Returns:
The same dict as coord_dict_raw with a new column ‘sps’ added to each axis DataFrame.
- Return type:
Dict[str, DataFrame]
- STAVAG.gene_modules(adata, gene_list)[source]
Cluster genes into modules using correlation among selected genes.
- Parameters:
adata (AnnData) – AnnData that contains the expression matrix
adata.Xand gene names inadata.var.index.gene_list (Sequence[str]) – Genes to include when building modules. Each gene should exist in
adata.var.index.
- Returns:
Z: Linkage matrix from hierarchical clustering.
corr: Gene to gene correlation matrix as a pandas DataFrame.Index and columns are gene names in
gene_list.df: Expression matrix of the selected genes as a pandas DataFrame. Rows are cells and columns are genes.
- Return type:
Tuple[np.ndarray, pd.DataFrame, pd.DataFrame]
- STAVAG.generate_coord_dict(n_dim)[source]
Build a placeholder dict for coordinate axes.
- Parameters:
n_dim (int) – Number of coordinate dimensions. Supported up to four.
- Returns:
A dict mapping axis names to None. For example {‘x’: None, ‘y’: None} for two dimensions.
- Raises:
ValueError – If n_dim is greater than four.
- Return type:
Dict[str, DataFrame | None]
- STAVAG.keep_variant_genes(coord_dict_raw, coord_dict_rand, n_dim, threshold=0.05, keys=None)[source]
Filter genes whose observed importance exceeds a random baseline.
For each axis this keeps rows where Importance is greater than a high percentile of the random importance distribution.
- Parameters:
coord_dict_raw (Dict[str, DataFrame]) – Dict of DataFrames per axis with importance values.
coord_dict_rand (Dict[str, ndarray]) – Dict of random importance arrays per axis.
n_dim (int) – Number of coordinate dimensions.
threshold (float) – Significance level. For example 0.05 targets the top tail of the random distribution.
keys (Sequence[str] | None) – Optional explicit axis names.
- Returns:
Filtered dict with the same structure as coord_dict_raw.
- Return type:
Dict[str, DataFrame]
Key functions
Build a placeholder dict for coordinate axes. |
|
Compute STAVAG priority scores (sps) for each axis. |
|
Filter genes whose observed importance exceeds a random baseline. |
|
Detect Directionally Variable Genes (DVGs) using regression on spatial coordinates. |
|
Detect Temporally Variable Genes (TVGs) using regression on a 1D time coordinate. |
|
Cluster genes into modules using correlation among selected genes. |