schist.inference

Functions

fit_model(→ Optional[anndata.AnnData])

Cluster cells using the nested Stochastic Block Model [Peixoto14],

fit_model_multi(→ [Union[List[anndata.AnnData]], ...)

Cluster cells into subgroups using multiple modalities.

nested_model(→ Optional[anndata.AnnData])

This function has been deprecated and it soon will be removed.

flat_model(→ Optional[anndata.AnnData])

This function has been deprecated and it soon will be removed.

nested_model_multi(→ Optional[List[anndata.AnnData]])

This function has been deprecated and it soon will be removed.

Package Contents

schist.inference.fit_model(adata: anndata.AnnData, deg_corr: bool = True, tolerance: float = 0.0001, n_sweep: int = 10, beta: float = np.inf, n_init: int = 100, model: Literal['nsbm', 'sbm', 'ppbm'] = 'nsbm', max_iter: int = 1000, collect_marginals: bool = True, refine_model: bool = False, refine_iter: int = 100, n_jobs: int = -1, key_added: str | None = None, adjacency: scipy.sparse.spmatrix | None = None, neighbors_key: str | None = 'neighbors', directed: bool = False, use_weights: bool = False, save_model: str | None = None, copy: bool = False, random_seed: int | None = None, dispatch_backend: str | None = 'loky') anndata.AnnData | None

Cluster cells using the nested Stochastic Block Model [Peixoto14], performing Bayesian inference on node groups.

This requires having ran neighbors() or bbknn() first.

Parameters

adata

The annotated data matrix.

deg_corr

Whether to use degree correction in the minimization step. In many real world networks this is the case, although this doesn’t seem the case for KNN graphs used in scanpy.

tolerance

Tolerance for fast model convergence.

n_sweep

Number of iterations to be performed in the fast model MCMC greedy approach

beta

Inverse temperature for MCMC greedy approach

n_init

Number of concurrent minimizations to be performed. The final model will be a consensus over these models.

model

The SBM model to use. nsbm implements Nested Stochastic Block Model. sbm is the Stochastic Block Model. ppbm is the Planted Partition Block Model which only has an assortativity prior.

max_iter

Maximum number of iterations during minimization, set to infinite to stop minimization only on tolerance

collect_marginals

Collect marginal distribution of cells, that is the probability to belong to any cluster

refine_model

Wether to perform a further mcmc step to refine the model

refine_iter

Number of refinement iterations.

n_jobs

Number of parallel computations used during model initialization

key_added

adata.obs key under which to add the cluster labels.

adjacency

Sparse adjacency matrix of the graph, defaults to adata.uns[‘neighbors’][‘connectivities’] in case of scanpy<=1.4.6 or adata.obsp[neighbors_key][connectivity_key] for scanpy>1.4.6

neighbors_key

The key passed to sc.pp.neighbors

directed

Whether to treat the graph as directed or undirected.

use_weights

If True, edge weights from the graph are used in the computation (placing more emphasis on stronger edges). Note that this increases computation times

save_model

If provided, this will be the filename for the PartitionModeState to be saved. The PartitionModeState contains all the models minimized during inference.

copy

Whether to copy adata or modify it inplace.

random_seed

Random number to be used as seed for graph-tool

Returns

adata.obs[key_added]

Array of dim (number of cells) that stores the subgroup id (‘0’, ‘1’, …) for each cell.

adata.uns[‘schist’][model][‘stats’]

A dict with entropy and modularity values

adata.uns[‘schist’][model][‘params’]

A dict with the values for the parameters used

adata.obsm[‘CM_nsbm_level_{n}’] or adata.obsm[‘CM_model’]

A np.ndarray with cell probability of belonging to a specific group

adata.uns[‘schist’][model][‘state’]

The block model, to be used in case a gt state should be initialized

schist.inference.fit_model_multi(mdata: List[anndata.AnnData] | mudata.MuData, deg_corr: bool = True, tolerance: float = 0.0001, n_sweep: int = 10, beta: float = np.inf, n_init: int = 100, model: Literal['nsbm', 'sbm'] = 'nsbm', max_iter: int = 1000, collect_marginals: bool = True, refine_model: bool = False, refine_iter: int = 100, n_jobs: int = -1, overlap: bool = False, key_added: str | None = None, adjacency: List[scipy.sparse.spmatrix] | None = None, neighbors_key: List[str] | None = ['neighbors'], directed: bool = False, use_weights: bool = False, save_model: str | None = None, copy: bool = False, dispatch_backend: str | None = 'loky', random_seed: int | None = None) [List[anndata.AnnData], mudata.MuData, None]

Cluster cells into subgroups using multiple modalities.

Cluster cells using the nested Stochastic Block Model [Peixoto14], performing Bayesian inference on node groups. This function takes multiple experiments, possibly across different modalities, and perform joint clustering.

This requires having ran neighbors() or bbknn() first. It also requires cells having the same names if coming from paired experiments

Parameters

mdata

A list of processed AnnData. Neighbors must have been already calculated. If a MuData object is passed, a model on the layered graph will be fitted. If you want to fit a model on the shared graph representation, e.g. WNN graph or a graph built on MOFA latent factors, you still can use the standard scs.inference.model() function.

deg_corr

Whether to use degree correction in the minimization step. In many real world networks this is the case, although this doesn’t seem the case for KNN graphs used in scanpy.

tolerance

Tolerance for fast model convergence.

n_sweep

Number of iterations to be performed in the fast model MCMC greedy approach

beta

Inverse temperature for MCMC greedy approach

n_init

Number of initial minimizations to be performed. The one with smaller entropy is chosen

refine_model

Wether to perform a further mcmc step to refine the model

refine_iter

Number of refinement iterations.

max_iter

Maximum number of iterations during minimization, set to infinite to stop minimization only on tolerance

overlap

Whether the different layers are dependent (overlap=True) or not (overlap=False)

n_jobs

Number of parallel computations used during model initialization

key_added

adata.obs key under which to add the cluster labels.

adjacency

Sparse adjacency matrix of the graph, defaults to adata.uns[‘neighbors’][‘connectivities’] in case of scanpy<=1.4.6 or adata.obsp[neighbors_key][connectivity_key] for scanpy>1.4.6

neighbors_key

The key passed to sc.pp.neighbors. If all AnnData share the same key, one only has to be specified, otherwise the full tuple of all keys must be provided

directed

Whether to treat the graph as directed or undirected.

use_weights

If True, edge weights from the graph are used in the computation (placing more emphasis on stronger edges). Note that this increases computation times

save_model

If provided, this will be the filename for the PartitionModeState to be saved

copy

Whether to copy adata or modify it inplace.

random_seed

Random number to be used as seed for graph-tool

Returns

adata.obs[key_added]

Array of dim (number of cells) that stores the subgroup id (‘0’, ‘1’, …) for each cell.

adata.uns[‘schist’][‘multi_level_params’]

A dict with the values for the parameters resolution, random_state, and n_iterations.

adata.uns[‘schist’][‘multi_level_stats’]

A dict with the values returned by mcmc_sweep

adata.obsm[‘CA_multi_nsbm_level_{n}’]

A np.ndarray with cell probability of belonging to a specific group

adata.uns[‘schist’][‘multi_level_state’]

The NestedBlockModel state object

schist.inference.nested_model(adata: anndata.AnnData, deg_corr: bool = True, tolerance: float = 1e-06, n_sweep: int = 10, beta: float = np.inf, n_init: int = 100, collect_marginals: bool = True, n_jobs: int = -1, refine_model: bool = False, refine_iter: int = 100, max_iter: int = 100000, *, restrict_to: Tuple[str, Sequence[str]] | None = None, random_seed: int | None = None, key_added: str = 'nsbm', adjacency: scipy.sparse.spmatrix | None = None, neighbors_key: str | None = 'neighbors', directed: bool = False, use_weights: bool = False, save_model: str | None = None, copy: bool = False, dispatch_backend: str | None = 'threads') anndata.AnnData | None

This function has been deprecated and it soon will be removed. It now wraps scs.inference.fit_model() function.

schist.inference.flat_model(adata: anndata.AnnData, n_sweep: int = 10, beta: float = np.inf, tolerance: float = 1e-06, collect_marginals: bool = True, deg_corr: bool = True, n_init: int = 100, n_jobs: int = -1, refine_model: bool = False, refine_iter: int = 100, max_iter: int = 100000, *, restrict_to: Tuple[str, Sequence[str]] | None = None, random_seed: int | None = None, key_added: str = 'sbm', adjacency: scipy.sparse.spmatrix | None = None, neighbors_key: str | None = 'neighbors', directed: bool = False, use_weights: bool = False, save_model: str | None = None, copy: bool = False, dispatch_backend: str | None = 'threads') anndata.AnnData | None

This function has been deprecated and it soon will be removed. It now wraps scs.inference.fit_model() function.

schist.inference.nested_model_multi(adatas: List[anndata.AnnData], deg_corr: bool = True, tolerance: float = 1e-06, n_sweep: int = 10, beta: float = np.inf, n_init: int = 100, collect_marginals: bool = True, n_jobs: int = -1, refine_model: bool = False, refine_iter: int = 100, overlap: bool = False, max_iter: int = 100000, *, random_seed: int | None = None, key_added: str = 'multi_nsbm', adjacency: List[scipy.sparse.spmatrix] | None = None, neighbors_key: List[str] | None = ['neighbors'], directed: bool = False, use_weights: bool = False, save_model: str | None = None, copy: bool = False, dispatch_backend: str | None = 'threads') List[anndata.AnnData] | None

This function has been deprecated and it soon will be removed. It now wraps scs.inference.fit_model() function.