API References

STAVAG.DVG_detection(adata, coords, sps=False, threshold=0.05, num_perm=1)[source]

Detect Directionally Variable Genes (DVGs) using regression on spatial coordinates.

Parameters:

adata (AnnData) – AnnData with expression matrix adata.X and gene names adata.var.index.
coords (ndarray) – Spatial coordinates of cells with shape (n_cells, n_dim). For example two columns for x and y or three columns for x y z.
sps (bool, optional) – If True, compute STAVAG priority scores by comparing observed importances with random baselines. Defaults to False.
threshold (float, optional) – Importance threshold used when selecting DVGs. Larger values keep more genes. Defaults to 0.05.
num_perm (int, optional) – Number of permutations used to build an empirical null distribution of feature importances. - If 1: keep the original single-permutation behavior. - If > 1: must be >= 100; empirical p-values are computed for each gene.

Returns:

Dictionary containing top important genes per coordinate axis (e.g., ‘x’, ‘y’, ‘z’), filtered with the threshold.

For num_perm == 1: same structure as before (with SPS scores if sps=True).
For num_perm > 1: each DataFrame contains columns [‘Feature’, ‘Importance’, ‘null_mean’, ‘pval’] and is filtered by p-value (<= threshold).

Return type:

Dict[str, DataFrame]

STAVAG.TVG_detection(adata, coords, sps=False, threshold=0.05, num_perm=1)[source]

Detect Temporally Variable Genes (TVGs) using regression on a 1D time coordinate.

Parameters:

adata (AnnData) – An AnnData object containing gene expression matrix adata.X and gene names adata.var.index.
coords (ndarray) – 1D temporal coordinate of cells with shape (n_cells, 1).
sps (bool, optional) – If True and num_perm == 1, compute STAVAG priority scores by comparing observed importances with a single random baseline (original behavior). Defaults to False.
threshold (float, optional) –
- If num_perm == 1: cutoff used by keep_variant_genes to select TVGs based on importance.
- If num_perm > 1: p-value cutoff; genes with pval <= threshold are kept.
Defaults to 0.05.
num_perm (int, optional) –
Number of permutations used to build an empirical null distribution of feature importances. - If 1: keep the original single-permutation behavior. - If > 1: must be >= 100; empirical permutation p-values are computed

for each gene.

Returns:

Dictionary containing important genes over the time axis 'T'.

For num_perm == 1: same structure as before (with SPS scores if sps=True), already filtered by the given threshold.
For num_perm > 1: the DataFrame under key ‘T’ contains columns: [‘Feature’, ‘Importance’, ‘null_mean’, ‘pval’] and is filtered by p-value (<= threshold).

Return type:

Dict[str, DataFrame]

STAVAG.calculate_sps(coord_dict_raw, coord_dict_rand, n_dim, keys=None)[source]

Compute STAVAG priority scores (sps) for each axis.

The score is the right tail proportion of random importances that are greater than or equal to the observed importance.

Parameters:

coord_dict_raw (Dict[str, DataFrame]) – Dict of DataFrames per axis. Each DataFrame must contain columns ‘Feature’ and ‘Importance’.
coord_dict_rand (Dict[str, ndarray]) – Dict of random importance arrays per axis.
n_dim (int) – Number of coordinate dimensions.
keys (Sequence[str] | None) – Optional explicit axis names to use.

Returns:

The same dict as coord_dict_raw with a new column ‘sps’ added to each axis DataFrame.

Return type:

Dict[str, DataFrame]

STAVAG.gene_modules(adata, gene_list)[source]

Cluster genes into modules using correlation among selected genes.

Parameters:

adata (AnnData) – AnnData that contains the expression matrix adata.X and gene names in adata.var.index.
gene_list (Sequence[str]) – Genes to include when building modules. Each gene should exist in adata.var.index.

Returns:

Z: Linkage matrix from hierarchical clustering.

corr: Gene to gene correlation matrix as a pandas DataFrame.Index and columns are gene names in gene_list.

df: Expression matrix of the selected genes as a pandas DataFrame. Rows are cells and columns are genes.

Return type:

Tuple[np.ndarray, pd.DataFrame, pd.DataFrame]

STAVAG.generate_coord_dict(n_dim)[source]

Build a placeholder dict for coordinate axes.

Parameters:: n_dim (int) – Number of coordinate dimensions. Supported up to four.
Returns:: A dict mapping axis names to None. For example {‘x’: None, ‘y’: None} for two dimensions.
Raises:: ValueError – If n_dim is greater than four.
Return type:: Dict[str, DataFrame | None]

STAVAG.keep_variant_genes(coord_dict_raw, coord_dict_rand, n_dim, threshold=0.05, keys=None)[source]

Filter genes whose observed importance exceeds a random baseline.

For each axis this keeps rows where Importance is greater than a high percentile of the random importance distribution.

Parameters:

coord_dict_raw (Dict[str, DataFrame]) – Dict of DataFrames per axis with importance values.
coord_dict_rand (Dict[str, ndarray]) – Dict of random importance arrays per axis.
n_dim (int) – Number of coordinate dimensions.
threshold (float) – Significance level. For example 0.05 targets the top tail of the random distribution.
keys (Sequence[str] | None) – Optional explicit axis names.

Returns:

Filtered dict with the same structure as coord_dict_raw.

Return type:

Dict[str, DataFrame]

Key functions

`generate_coord_dict`	Build a placeholder dict for coordinate axes.
`calculate_sps`	Compute STAVAG priority scores (sps) for each axis.
`keep_variant_genes`	Filter genes whose observed importance exceeds a random baseline.
`DVG_detection`	Detect Directionally Variable Genes (DVGs) using regression on spatial coordinates.
`TVG_detection`	Detect Temporally Variable Genes (TVGs) using regression on a 1D time coordinate.
`gene_modules`	Cluster genes into modules using correlation among selected genes.