schist.inference ================ .. py:module:: schist.inference Functions --------- .. autoapisummary:: schist.inference.fit_model schist.inference.fit_model_multi schist.inference.nested_model schist.inference.flat_model schist.inference.nested_model_multi Package Contents ---------------- .. py:function:: fit_model(adata: anndata.AnnData, deg_corr: bool = True, tolerance: float = 0.0001, n_sweep: int = 10, beta: float = np.inf, n_init: int = 100, model: Literal['nsbm', 'sbm', 'ppbm'] = 'nsbm', max_iter: int = 1000, collect_marginals: bool = True, refine_model: bool = False, refine_iter: int = 100, n_jobs: int = -1, key_added: str | None = None, adjacency: Optional[scipy.sparse.spmatrix] = None, neighbors_key: Optional[str] = 'neighbors', directed: bool = False, use_weights: bool = False, save_model: Union[str, None] = None, copy: bool = False, random_seed: Optional[int] = None, dispatch_backend: Optional[str] = 'loky') -> Optional[anndata.AnnData] Cluster cells using the nested Stochastic Block Model [Peixoto14]_, performing Bayesian inference on node groups. This requires having ran :func:`~scanpy.pp.neighbors` or :func:`~scanpy.external.pp.bbknn` first. Parameters ---------- adata The annotated data matrix. deg_corr Whether to use degree correction in the minimization step. In many real world networks this is the case, although this doesn't seem the case for KNN graphs used in scanpy. tolerance Tolerance for fast model convergence. n_sweep Number of iterations to be performed in the fast model MCMC greedy approach beta Inverse temperature for MCMC greedy approach n_init Number of concurrent minimizations to be performed. The final model will be a consensus over these models. model The SBM model to use. `nsbm` implements Nested Stochastic Block Model. `sbm` is the Stochastic Block Model. `ppbm` is the Planted Partition Block Model which only has an assortativity prior. max_iter Maximum number of iterations during minimization, set to infinite to stop minimization only on tolerance collect_marginals Collect marginal distribution of cells, that is the probability to belong to any cluster refine_model Wether to perform a further mcmc step to refine the model refine_iter Number of refinement iterations. n_jobs Number of parallel computations used during model initialization key_added `adata.obs` key under which to add the cluster labels. adjacency Sparse adjacency matrix of the graph, defaults to `adata.uns['neighbors']['connectivities']` in case of scanpy<=1.4.6 or `adata.obsp[neighbors_key][connectivity_key]` for scanpy>1.4.6 neighbors_key The key passed to `sc.pp.neighbors` directed Whether to treat the graph as directed or undirected. use_weights If `True`, edge weights from the graph are used in the computation (placing more emphasis on stronger edges). Note that this increases computation times save_model If provided, this will be the filename for the PartitionModeState to be saved. The PartitionModeState contains all the models minimized during inference. copy Whether to copy `adata` or modify it inplace. random_seed Random number to be used as seed for graph-tool Returns ------- `adata.obs[key_added]` Array of dim (number of cells) that stores the subgroup id (`'0'`, `'1'`, ...) for each cell. `adata.uns['schist'][model]['stats']` A dict with entropy and modularity values `adata.uns['schist'][model]['params']` A dict with the values for the parameters used `adata.obsm['CM_nsbm_level_{n}']` or `adata.obsm['CM_model']` A `np.ndarray` with cell probability of belonging to a specific group `adata.uns['schist'][model]['state']` The block model, to be used in case a gt state should be initialized .. py:function:: fit_model_multi(mdata: Union[List[anndata.AnnData], mudata.MuData], deg_corr: bool = True, tolerance: float = 0.0001, n_sweep: int = 10, beta: float = np.inf, n_init: int = 100, model: Literal['nsbm', 'sbm'] = 'nsbm', max_iter: int = 1000, collect_marginals: bool = True, refine_model: bool = False, refine_iter: int = 100, n_jobs: int = -1, overlap: bool = False, key_added: str | None = None, adjacency: Optional[List[scipy.sparse.spmatrix]] = None, neighbors_key: Optional[List[str]] = ['neighbors'], directed: bool = False, use_weights: bool = False, save_model: Union[str, None] = None, copy: bool = False, dispatch_backend: Optional[str] = 'loky', random_seed: Optional[int] = None) -> [Union[List[anndata.AnnData]], mudata.MuData, None] Cluster cells into subgroups using multiple modalities. Cluster cells using the nested Stochastic Block Model [Peixoto14]_, performing Bayesian inference on node groups. This function takes multiple experiments, possibly across different modalities, and perform joint clustering. This requires having ran :func:`~scanpy.pp.neighbors` or :func:`~scanpy.external.pp.bbknn` first. It also requires cells having the same names if coming from paired experiments Parameters ---------- mdata A list of processed AnnData. Neighbors must have been already calculated. If a MuData object is passed, a model on the layered graph will be fitted. If you want to fit a model on the shared graph representation, e.g. WNN graph or a graph built on MOFA latent factors, you still can use the standard ``scs.inference.model()`` function. deg_corr Whether to use degree correction in the minimization step. In many real world networks this is the case, although this doesn't seem the case for KNN graphs used in scanpy. tolerance Tolerance for fast model convergence. n_sweep Number of iterations to be performed in the fast model MCMC greedy approach beta Inverse temperature for MCMC greedy approach n_init Number of initial minimizations to be performed. The one with smaller entropy is chosen refine_model Wether to perform a further mcmc step to refine the model refine_iter Number of refinement iterations. max_iter Maximum number of iterations during minimization, set to infinite to stop minimization only on tolerance overlap Whether the different layers are dependent (overlap=True) or not (overlap=False) n_jobs Number of parallel computations used during model initialization key_added `adata.obs` key under which to add the cluster labels. adjacency Sparse adjacency matrix of the graph, defaults to `adata.uns['neighbors']['connectivities']` in case of scanpy<=1.4.6 or `adata.obsp[neighbors_key][connectivity_key]` for scanpy>1.4.6 neighbors_key The key passed to `sc.pp.neighbors`. If all AnnData share the same key, one only has to be specified, otherwise the full tuple of all keys must be provided directed Whether to treat the graph as directed or undirected. use_weights If `True`, edge weights from the graph are used in the computation (placing more emphasis on stronger edges). Note that this increases computation times save_model If provided, this will be the filename for the PartitionModeState to be saved copy Whether to copy `adata` or modify it inplace. random_seed Random number to be used as seed for graph-tool Returns ------- `adata.obs[key_added]` Array of dim (number of cells) that stores the subgroup id (`'0'`, `'1'`, ...) for each cell. `adata.uns['schist']['multi_level_params']` A dict with the values for the parameters `resolution`, `random_state`, and `n_iterations`. `adata.uns['schist']['multi_level_stats']` A dict with the values returned by mcmc_sweep `adata.obsm['CA_multi_nsbm_level_{n}']` A `np.ndarray` with cell probability of belonging to a specific group `adata.uns['schist']['multi_level_state']` The NestedBlockModel state object .. py:function:: nested_model(adata: anndata.AnnData, deg_corr: bool = True, tolerance: float = 1e-06, n_sweep: int = 10, beta: float = np.inf, n_init: int = 100, collect_marginals: bool = True, n_jobs: int = -1, refine_model: bool = False, refine_iter: int = 100, max_iter: int = 100000, *, restrict_to: Optional[Tuple[str, Sequence[str]]] = None, random_seed: Optional[int] = None, key_added: str = 'nsbm', adjacency: Optional[scipy.sparse.spmatrix] = None, neighbors_key: Optional[str] = 'neighbors', directed: bool = False, use_weights: bool = False, save_model: Union[str, None] = None, copy: bool = False, dispatch_backend: Optional[str] = 'threads') -> Optional[anndata.AnnData] This function has been deprecated and it soon will be removed. It now wraps ``scs.inference.fit_model()`` function. .. py:function:: flat_model(adata: anndata.AnnData, n_sweep: int = 10, beta: float = np.inf, tolerance: float = 1e-06, collect_marginals: bool = True, deg_corr: bool = True, n_init: int = 100, n_jobs: int = -1, refine_model: bool = False, refine_iter: int = 100, max_iter: int = 100000, *, restrict_to: Optional[Tuple[str, Sequence[str]]] = None, random_seed: Optional[int] = None, key_added: str = 'sbm', adjacency: Optional[scipy.sparse.spmatrix] = None, neighbors_key: Optional[str] = 'neighbors', directed: bool = False, use_weights: bool = False, save_model: Union[str, None] = None, copy: bool = False, dispatch_backend: Optional[str] = 'threads') -> Optional[anndata.AnnData] This function has been deprecated and it soon will be removed. It now wraps ``scs.inference.fit_model()`` function. .. py:function:: nested_model_multi(adatas: List[anndata.AnnData], deg_corr: bool = True, tolerance: float = 1e-06, n_sweep: int = 10, beta: float = np.inf, n_init: int = 100, collect_marginals: bool = True, n_jobs: int = -1, refine_model: bool = False, refine_iter: int = 100, overlap: bool = False, max_iter: int = 100000, *, random_seed: Optional[int] = None, key_added: str = 'multi_nsbm', adjacency: Optional[List[scipy.sparse.spmatrix]] = None, neighbors_key: Optional[List[str]] = ['neighbors'], directed: bool = False, use_weights: bool = False, save_model: Union[str, None] = None, copy: bool = False, dispatch_backend: Optional[str] = 'threads') -> Optional[List[anndata.AnnData]] This function has been deprecated and it soon will be removed. It now wraps ``scs.inference.fit_model()`` function.