API References

STAVAG.DVG_detection(adata, coords, sps=False, threshold=0.05, num_perm=1)[source]

Detect Directionally Variable Genes (DVGs) using regression on spatial coordinates.

Parameters:
  • adata (AnnData) – AnnData with expression matrix adata.X and gene names adata.var.index.

  • coords (ndarray) – Spatial coordinates of cells with shape (n_cells, n_dim). For example two columns for x and y or three columns for x y z.

  • sps (bool, optional) – If True, compute STAVAG priority scores by comparing observed importances with random baselines. Defaults to False.

  • threshold (float, optional) – Importance threshold used when selecting DVGs. Larger values keep more genes. Defaults to 0.05.

  • num_perm (int, optional) – Number of permutations used to build an empirical null distribution of feature importances. - If 1: keep the original single-permutation behavior. - If > 1: must be >= 100; empirical p-values are computed for each gene.

Returns:

Dictionary containing top important genes per coordinate axis (e.g., ‘x’, ‘y’, ‘z’), filtered with the threshold.

  • For num_perm == 1: same structure as before (with SPS scores if sps=True).

  • For num_perm > 1: each DataFrame contains columns [‘Feature’, ‘Importance’, ‘null_mean’, ‘pval’] and is filtered by p-value (<= threshold).

Return type:

Dict[str, DataFrame]

STAVAG.TVG_detection(adata, coords, sps=False, threshold=0.05, num_perm=1)[source]

Detect Temporally Variable Genes (TVGs) using regression on a 1D time coordinate.

Parameters:
  • adata (AnnData) – An AnnData object containing gene expression matrix adata.X and gene names adata.var.index.

  • coords (ndarray) – 1D temporal coordinate of cells with shape (n_cells, 1).

  • sps (bool, optional) – If True and num_perm == 1, compute STAVAG priority scores by comparing observed importances with a single random baseline (original behavior). Defaults to False.

  • threshold (float, optional) –

    • If num_perm == 1: cutoff used by keep_variant_genes to select TVGs based on importance.

    • If num_perm > 1: p-value cutoff; genes with pval <= threshold are kept.

    Defaults to 0.05.

  • num_perm (int, optional) –

    Number of permutations used to build an empirical null distribution of feature importances. - If 1: keep the original single-permutation behavior. - If > 1: must be >= 100; empirical permutation p-values are computed

    for each gene.

Returns:

Dictionary containing important genes over the time axis 'T'.

  • For num_perm == 1: same structure as before (with SPS scores if sps=True), already filtered by the given threshold.

  • For num_perm > 1: the DataFrame under key ‘T’ contains columns: [‘Feature’, ‘Importance’, ‘null_mean’, ‘pval’] and is filtered by p-value (<= threshold).

Return type:

Dict[str, DataFrame]

STAVAG.calculate_sps(coord_dict_raw, coord_dict_rand, n_dim, keys=None)[source]

Compute STAVAG priority scores (sps) for each axis.

The score is the right tail proportion of random importances that are greater than or equal to the observed importance.

Parameters:
  • coord_dict_raw (Dict[str, DataFrame]) – Dict of DataFrames per axis. Each DataFrame must contain columns ‘Feature’ and ‘Importance’.

  • coord_dict_rand (Dict[str, ndarray]) – Dict of random importance arrays per axis.

  • n_dim (int) – Number of coordinate dimensions.

  • keys (Sequence[str] | None) – Optional explicit axis names to use.

Returns:

The same dict as coord_dict_raw with a new column ‘sps’ added to each axis DataFrame.

Return type:

Dict[str, DataFrame]

STAVAG.gene_modules(adata, gene_list)[source]

Cluster genes into modules using correlation among selected genes.

Parameters:
  • adata (AnnData) – AnnData that contains the expression matrix adata.X and gene names in adata.var.index.

  • gene_list (Sequence[str]) – Genes to include when building modules. Each gene should exist in adata.var.index.

Returns:

Z: Linkage matrix from hierarchical clustering.

corr: Gene to gene correlation matrix as a pandas DataFrame.Index and columns are gene names in gene_list.

df: Expression matrix of the selected genes as a pandas DataFrame. Rows are cells and columns are genes.

Return type:

Tuple[np.ndarray, pd.DataFrame, pd.DataFrame]

STAVAG.generate_coord_dict(n_dim)[source]

Build a placeholder dict for coordinate axes.

Parameters:

n_dim (int) – Number of coordinate dimensions. Supported up to four.

Returns:

A dict mapping axis names to None. For example {‘x’: None, ‘y’: None} for two dimensions.

Raises:

ValueError – If n_dim is greater than four.

Return type:

Dict[str, DataFrame | None]

STAVAG.keep_variant_genes(coord_dict_raw, coord_dict_rand, n_dim, threshold=0.05, keys=None)[source]

Filter genes whose observed importance exceeds a random baseline.

For each axis this keeps rows where Importance is greater than a high percentile of the random importance distribution.

Parameters:
  • coord_dict_raw (Dict[str, DataFrame]) – Dict of DataFrames per axis with importance values.

  • coord_dict_rand (Dict[str, ndarray]) – Dict of random importance arrays per axis.

  • n_dim (int) – Number of coordinate dimensions.

  • threshold (float) – Significance level. For example 0.05 targets the top tail of the random distribution.

  • keys (Sequence[str] | None) – Optional explicit axis names.

Returns:

Filtered dict with the same structure as coord_dict_raw.

Return type:

Dict[str, DataFrame]

Key functions

generate_coord_dict

Build a placeholder dict for coordinate axes.

calculate_sps

Compute STAVAG priority scores (sps) for each axis.

keep_variant_genes

Filter genes whose observed importance exceeds a random baseline.

DVG_detection

Detect Directionally Variable Genes (DVGs) using regression on spatial coordinates.

TVG_detection

Detect Temporally Variable Genes (TVGs) using regression on a 1D time coordinate.

gene_modules

Cluster genes into modules using correlation among selected genes.