`schist.inference`

Package Contents

Functions

`nested_model`(→ Optional[anndata.AnnData])	Cluster cells into subgroups [Peixoto14].
`flat_model`(→ Optional[anndata.AnnData])	Cluster cells into subgroups [Peixoto14].
`planted_model`(→ Optional[anndata.AnnData])	Cluster cells into subgroups [Peixoto14].
`leiden`(→ Optional[anndata.AnnData])	Cluster cells into subgroups [Traag18].
`nested_model_multi`(→ Optional[List[anndata.AnnData]])	Cluster cells into subgroups using multiple modalities.

schist.inference.nested_model(adata: anndata.AnnData, deg_corr: bool = True, tolerance: float = 1e-06, n_sweep: int = 10, beta: float = np.inf, n_init: int = 100, collect_marginals: bool = True, n_jobs: int = -1, refine_model: bool = False, refine_iter: int = 100, max_iter: int = 100000, *, restrict_to: Tuple[str, Sequence[str]] | None = None, random_seed: int | None = None, key_added: str = 'nsbm', adjacency: scipy.sparse.spmatrix | None = None, neighbors_key: str | None = 'neighbors', directed: bool = False, use_weights: bool = False, save_model: str | None = None, copy: bool = False, dispatch_backend: str | None = 'threads') → anndata.AnnData | None

Cluster cells into subgroups [Peixoto14].

Cluster cells using the nested Stochastic Block Model [Peixoto14], a hierarchical version of Stochastic Block Model [Holland83], performing Bayesian inference on node groups. NSBM should circumvent classical limitations of SBM in detecting small groups in large graphs replacing the noninformative priors used by a hierarchy of priors and hyperpriors.

This requires having ran neighbors() or bbknn() first.

Parameters

adata: The annotated data matrix.
deg_corr: Whether to use degree correction in the minimization step. In many real world networks this is the case, although this doesn’t seem the case for KNN graphs used in scanpy.
tolerance: Tolerance for fast model convergence.
n_sweep: Number of iterations to be performed in the fast model MCMC greedy approach
beta: Inverse temperature for MCMC greedy approach
n_init: Number of initial minimizations to be performed. The one with smaller entropy is chosen
refine_model: Wether to perform a further mcmc step to refine the model
refine_iter: Number of refinement iterations.
max_iter: Maximum number of iterations during minimization, set to infinite to stop minimization only on tolerance
n_jobs: Number of parallel computations used during model initialization
key_added: adata.obs key under which to add the cluster labels.
adjacency: Sparse adjacency matrix of the graph, defaults to adata.uns[‘neighbors’][‘connectivities’] in case of scanpy<=1.4.6 or adata.obsp[neighbors_key][connectivity_key] for scanpy>1.4.6
neighbors_key: The key passed to sc.pp.neighbors
directed: Whether to treat the graph as directed or undirected.
use_weights: If True, edge weights from the graph are used in the computation (placing more emphasis on stronger edges). Note that this increases computation times
save_model: If provided, this will be the filename for the PartitionModeState to be saved
copy: Whether to copy adata or modify it inplace.
random_seed: Random number to be used as seed for graph-tool

Returns

adata.obs[key_added]: Array of dim (number of cells) that stores the subgroup id (‘0’, ‘1’, …) for each cell.
adata.uns[‘schist’][‘params’]: A dict with the values for the parameters resolution, random_state, and n_iterations.
adata.uns[‘schist’][‘stats’]: A dict with the values returned by mcmc_sweep
adata.obsm[‘CA_nsbm_level_{n}’]: A np.ndarray with cell probability of belonging to a specific group
adata.uns[‘schist’][‘state’]: The NestedBlockModel state object

schist.inference.flat_model(adata: anndata.AnnData, n_sweep: int = 10, beta: float = np.inf, tolerance: float = 1e-06, collect_marginals: bool = True, deg_corr: bool = True, n_init: int = 100, n_jobs: int = -1, refine_model: bool = False, refine_iter: int = 100, max_iter: int = 100000, *, restrict_to: Tuple[str, Sequence[str]] | None = None, random_seed: int | None = None, key_added: str = 'sbm', adjacency: scipy.sparse.spmatrix | None = None, neighbors_key: str | None = 'neighbors', directed: bool = False, use_weights: bool = False, save_model: str | None = None, copy: bool = False, dispatch_backend: str | None = 'threads') → anndata.AnnData | None

Cluster cells into subgroups [Peixoto14].

Cluster cells using the Stochastic Block Model [Peixoto14], performing Bayesian inference on node groups.

This requires having ran neighbors() or bbknn() first.

Parameters

adata: The annotated data matrix.
n_sweep: Number of MCMC sweeps to get the initial guess
beta: Inverse temperature for the initial MCMC sweep
tolerance: Difference in description length to stop MCMC sweep iterations
collect_marginals: Whether or not collect node probability of belonging to a specific partition.
deg_corr: Whether to use degree correction in the minimization step. In many real world networks this is the case, although this doesn’t seem the case for KNN graphs used in scanpy.
n_init: Number of initial minimizations to be performed. This influences also the precision for marginals
refine_model: Wether to perform a further mcmc step to refine the model
refine_iter: Number of refinement iterations.
max_iter: Maximum number of iterations during minimization, set to infinite to stop minimization only on tolerance
key_added: adata.obs key under which to add the cluster labels.
adjacency: Sparse adjacency matrix of the graph, defaults to adata.uns[‘neighbors’][‘connectivities’] in case of scanpy<=1.4.6 or adata.obsp[neighbors_key][connectivity_key] for scanpy>1.4.6
neighbors_key: The key passed to sc.pp.neighbors
directed: Whether to treat the graph as directed or undirected.
use_weights: If True, edge weights from the graph are used in the computation (placing more emphasis on stronger edges). Note that this increases computation times
save_model: If provided, this will be the filename for the PartitionModeState to be saved
copy: Whether to copy adata or modify it inplace.
random_seed: Random number to be used as seed for graph-tool
n_jobs: Number of parallel computations used during model initialization

Returns

adata.obs[key_added]: Array of dim (number of cells) that stores the subgroup id (‘0’, ‘1’, …) for each cell.
adata.uns[‘schist’][‘params’]: A dict with the values for the parameters resolution, random_state, and n_iterations.
adata.uns[‘schist’][‘stats’]: A dict with the values returned by mcmc_sweep
adata.obsm[‘CM_sbm’]: A np.ndarray with cell probability of belonging to a specific group
adata.uns[‘schist’][‘state’]: The BlockModel state object

schist.inference.planted_model(adata: anndata.AnnData, n_sweep: int = 10, beta: float = np.inf, tolerance=1e-06, collect_marginals: bool = True, deg_corr: bool = True, n_init: int = 100, n_jobs: int = -1, refine_model: bool = False, refine_iter: int = 100, max_iter: int = 100000, *, restrict_to: Tuple[str, Sequence[str]] | None = None, random_seed: int | None = None, key_added: str = 'ppbm', adjacency: scipy.sparse.spmatrix | None = None, neighbors_key: str | None = 'neighbors', directed: bool = False, use_weights: bool = False, copy: bool = False, save_model: str | None = None, dispatch_backend: str | None = 'threads') → anndata.AnnData | None

Cluster cells into subgroups [Peixoto14].

Cluster cells using the Planted Partition Block Model [Peixoto14], performing Bayesian inference on node groups. This function, in particular, uses the Planted Block Model, which is particularly suitable in case of assortative graphs and it returns the optimal number of communities

This requires having ran neighbors() or bbknn() first.

Parameters

adata: The annotated data matrix.
n_sweep: Number of MCMC sweeps to get the initial guess
beta: Inverse temperature for the initial MCMC sweep
tolerance: Difference in description length to stop MCMC sweep iterations
collect_marginals: Whether or not collect node probability of belonging to a specific partition.
deg_corr: Whether to use degree correction in the minimization step. In many real world networks this is the case, although this doesn’t seem the case for KNN graphs used in scanpy.
n_init: Number of initial minimizations to be performed. This influences also the precision for marginals
refine_model: Wether to perform a further mcmc step to refine the model
refine_iter: Number of refinement iterations.
max_iter: Maximum number of iterations during minimization, set to infinite to stop minimization only on tolerance
key_added: adata.obs key under which to add the cluster labels.
adjacency: Sparse adjacency matrix of the graph, defaults to adata.uns[‘neighbors’][‘connectivities’] in case of scanpy<=1.4.6 or adata.obsp[neighbors_key][connectivity_key] for scanpy>1.4.6
neighbors_key: The key passed to sc.pp.neighbors
directed: Whether to treat the graph as directed or undirected.
use_weights: If True, edge weights from the graph are used in the computation (placing more emphasis on stronger edges). Note that this increases computation times
copy: Whether to copy adata or modify it inplace.
save_model: If provided, this will be the filename for the PartitionModeState to be saved
random_seed: Random number to be used as seed for graph-tool
n_jobs: Number of parallel computations used during model initialization

Returns

adata.obs[key_added]: Array of dim (number of cells) that stores the subgroup id (‘0’, ‘1’, …) for each cell.
adata.uns[‘schist’][‘params’]: A dict with the values for the parameters resolution, random_state, and n_iterations.
adata.uns[‘schist’][‘stats’]: A dict with the values returned by mcmc_sweep
adata.obsm[‘CM_ppbm’]: A np.ndarray with cell probability of belonging to a specific group
adata.uns[‘schist’][‘state’]: The BlockModel state object

schist.inference.leiden(adata: anndata.AnnData, resolution: float = 1, n_init: int = 100, *, restrict_to: Tuple[str, Sequence[str]] | None = None, random_state: scanpy._utils.AnyRandom = 0, key_added: str = 'leiden', adjacency: scipy.sparse.spmatrix | None = None, directed: bool = True, use_weights: bool = True, n_iterations: int = -1, partition_type: Type[leidenalg.VertexPartition.MutableVertexPartition] | None = None, neighbors_key: str | None = None, obsp: str | None = None, collect_marginals: bool = True, n_jobs: int = -1, copy: bool = False, save_model: str | None = None, dispatch_backend: str | None = 'threads', **partition_kwargs) → anndata.AnnData | None

Cluster cells into subgroups [Traag18].

Cluster cells using the Leiden algorithm [Traag18], an improved version of the Louvain algorithm [Blondel08]. It has been proposed for single-cell analysis by [Levine15].

This requires having ran neighbors() or bbknn() first.

Parameters

adata: The annotated data matrix.
resolution: A parameter value controlling the coarseness of the clustering. Higher values lead to more clusters. Set to None if overriding partition_type to one that doesn’t accept a resolution_parameter.
n_init: The number of random initializations to take for consensus
random_state: Change the initialization of the optimization.
restrict_to: Restrict the clustering to the categories within the key for sample annotation, tuple needs to contain (obs_key, list_of_categories).
key_added: adata.obs key under which to add the cluster labels.
adjacency: Sparse adjacency matrix of the graph, defaults to neighbors connectivities.
directed: Whether to treat the graph as directed or undirected.
use_weights: If True, edge weights from the graph are used in the computation (placing more emphasis on stronger edges).
n_iterations: How many iterations of the Leiden clustering algorithm to perform. Positive values above 2 define the total number of iterations to perform, -1 has the algorithm run until it reaches its optimal clustering.
partition_type: Type of partition to use. Defaults to RBConfigurationVertexPartition. For the available options, consult the documentation for find_partition().
neighbors_key: Use neighbors connectivities as adjacency. If not specified, leiden looks .obsp[‘connectivities’] for connectivities (default storage place for pp.neighbors). If specified, leiden looks .obsp[.uns[neighbors_key][‘connectivities_key’]] for connectivities.
obsp: Use .obsp[obsp] as adjacency. You can’t specify both obsp and neighbors_key at the same time.
collect_marginals: Wheter to retrieve the marginal probability to belong to a group
n_jobs: Number of parallel jobs to calculate partitions
copy: Whether to copy adata or modify it inplace.
save_model: If provided, this will be the filename for the PartitionModeState to be saved
**partition_kwargs: Any further arguments to pass to ~leidenalg.find_partition (which in turn passes arguments to the partition_type).

Returns

adata.obs[key_added]: Array of dim (number of cells) that stores the subgroup id (‘0’, ‘1’, …) for each cell.
adata.uns[‘leiden’][‘params’]: A dict with the values for the parameters resolution, random_state, and n_iterations.

schist.inference.nested_model_multi(adatas: List[anndata.AnnData], deg_corr: bool = True, tolerance: float = 1e-06, n_sweep: int = 10, beta: float = np.inf, n_init: int = 100, collect_marginals: bool = True, n_jobs: int = -1, refine_model: bool = False, refine_iter: int = 100, overlap: bool = False, max_iter: int = 100000, *, random_seed: int | None = None, key_added: str = 'multi_nsbm', adjacency: List[scipy.sparse.spmatrix] | None = None, neighbors_key: List[str] | None = ['neighbors'], directed: bool = False, use_weights: bool = False, save_model: str | None = None, copy: bool = False, dispatch_backend: str | None = 'threads') → List[anndata.AnnData] | None

Cluster cells into subgroups using multiple modalities.

Cluster cells using the nested Stochastic Block Model [Peixoto14], performing Bayesian inference on node groups. This function takes multiple experiments, possibly across different modalities, and perform joint clustering.

This requires having ran neighbors() or bbknn() first. It also requires cells having the same names if coming from paired experiments

Parameters

adatas: A list of processed AnnData. Neighbors must have been already calculated.
deg_corr: Whether to use degree correction in the minimization step. In many real world networks this is the case, although this doesn’t seem the case for KNN graphs used in scanpy.
tolerance: Tolerance for fast model convergence.
n_sweep: Number of iterations to be performed in the fast model MCMC greedy approach
beta: Inverse temperature for MCMC greedy approach
n_init: Number of initial minimizations to be performed. The one with smaller entropy is chosen
refine_model: Wether to perform a further mcmc step to refine the model
refine_iter: Number of refinement iterations.
max_iter: Maximum number of iterations during minimization, set to infinite to stop minimization only on tolerance
overlap: Whether the different layers are dependent (overlap=True) or not (overlap=False)
n_jobs: Number of parallel computations used during model initialization
key_added: adata.obs key under which to add the cluster labels.
adjacency: Sparse adjacency matrix of the graph, defaults to adata.uns[‘neighbors’][‘connectivities’] in case of scanpy<=1.4.6 or adata.obsp[neighbors_key][connectivity_key] for scanpy>1.4.6
neighbors_key: The key passed to sc.pp.neighbors. If all AnnData share the same key, one only has to be specified, otherwise the full tuple of all keys must be provided
directed: Whether to treat the graph as directed or undirected.
use_weights: If True, edge weights from the graph are used in the computation (placing more emphasis on stronger edges). Note that this increases computation times
save_model: If provided, this will be the filename for the PartitionModeState to be saved
copy: Whether to copy adata or modify it inplace.
random_seed: Random number to be used as seed for graph-tool

Returns

adata.obs[key_added]: Array of dim (number of cells) that stores the subgroup id (‘0’, ‘1’, …) for each cell.
adata.uns[‘schist’][‘multi_level_params’]: A dict with the values for the parameters resolution, random_state, and n_iterations.
adata.uns[‘schist’][‘multi_level_stats’]: A dict with the values returned by mcmc_sweep
adata.obsm[‘CA_multi_nsbm_level_{n}’]: A np.ndarray with cell probability of belonging to a specific group
adata.uns[‘schist’][‘multi_level_state’]: The NestedBlockModel state object

schist.inference

Package Contents

Functions

Parameters

Returns

Parameters

Returns

Parameters

Returns

Parameters

Returns

Parameters

Returns

`schist.inference`