schist.inference
================

.. py:module:: schist.inference


Functions
---------

.. autoapisummary::

   schist.inference.fit_model
   schist.inference.fit_model_multi
   schist.inference.nested_model
   schist.inference.flat_model
   schist.inference.nested_model_multi


Package Contents
----------------

.. py:function:: fit_model(adata: anndata.AnnData, deg_corr: bool = True, tolerance: float = 0.0001, n_sweep: int = 10, beta: float = np.inf, n_init: int = 100, model: Literal['nsbm', 'sbm', 'ppbm'] = 'nsbm', max_iter: int = 1000, collect_marginals: bool = True, refine_model: bool = False, refine_iter: int = 100, n_jobs: int = -1, key_added: str | None = None, adjacency: Optional[scipy.sparse.spmatrix] = None, neighbors_key: Optional[str] = 'neighbors', directed: bool = False, use_weights: bool = False, save_model: Union[str, None] = None, copy: bool = False, random_seed: Optional[int] = None, dispatch_backend: Optional[str] = 'loky') -> Optional[anndata.AnnData]

   Cluster cells using the nested Stochastic Block Model [Peixoto14]_,
   performing Bayesian inference on node groups. 

   This requires having ran :func:`~scanpy.pp.neighbors` or
   :func:`~scanpy.external.pp.bbknn` first.

   Parameters
   ----------
   adata
       The annotated data matrix.
   deg_corr
       Whether to use degree correction in the minimization step. In many
       real world networks this is the case, although this doesn't seem
       the case for KNN graphs used in scanpy.
   tolerance
       Tolerance for fast model convergence.
   n_sweep 
       Number of iterations to be performed in the fast model MCMC greedy approach
   beta
       Inverse temperature for MCMC greedy approach    
   n_init
       Number of concurrent minimizations to be performed. The final model will be
       a consensus over these models.
   model
       The SBM model to use. `nsbm` implements Nested Stochastic Block Model. 
       `sbm` is the Stochastic Block Model. `ppbm` is the Planted Partition Block Model
       which only has an assortativity prior.
   max_iter
       Maximum number of iterations during minimization, set to infinite to stop 
       minimization only on tolerance
   collect_marginals
       Collect marginal distribution of cells, that is the probability
       to belong to any cluster    
   refine_model
       Wether to perform a further mcmc step to refine the model
   refine_iter
       Number of refinement iterations.
   n_jobs
       Number of parallel computations used during model initialization
   key_added
       `adata.obs` key under which to add the cluster labels.
   adjacency
       Sparse adjacency matrix of the graph, defaults to
       `adata.uns['neighbors']['connectivities']` in case of scanpy<=1.4.6 or
       `adata.obsp[neighbors_key][connectivity_key]` for scanpy>1.4.6
   neighbors_key
       The key passed to `sc.pp.neighbors`
   directed
       Whether to treat the graph as directed or undirected.
   use_weights
       If `True`, edge weights from the graph are used in the computation
       (placing more emphasis on stronger edges). Note that this
       increases computation times
   save_model
       If provided, this will be the filename for the PartitionModeState to 
       be saved. The PartitionModeState contains all the models minimized during 
       inference.
   copy
       Whether to copy `adata` or modify it inplace.
   random_seed
       Random number to be used as seed for graph-tool

   Returns
   -------
   `adata.obs[key_added]`
       Array of dim (number of cells) that stores the subgroup id
       (`'0'`, `'1'`, ...) for each cell. 
   `adata.uns['schist'][model]['stats']`
       A dict with entropy and modularity values
   `adata.uns['schist'][model]['params']`
       A dict with the values for the parameters used
   `adata.obsm['CM_nsbm_level_{n}']` or `adata.obsm['CM_model']`
       A `np.ndarray` with cell probability of belonging to a specific group
   `adata.uns['schist'][model]['state']`
       The block model, to be used in case a gt state should be initialized


.. py:function:: fit_model_multi(mdata: Union[List[anndata.AnnData], mudata.MuData], deg_corr: bool = True, tolerance: float = 0.0001, n_sweep: int = 10, beta: float = np.inf, n_init: int = 100, model: Literal['nsbm', 'sbm'] = 'nsbm', max_iter: int = 1000, collect_marginals: bool = True, refine_model: bool = False, refine_iter: int = 100, n_jobs: int = -1, overlap: bool = False, key_added: str | None = None, adjacency: Optional[List[scipy.sparse.spmatrix]] = None, neighbors_key: Optional[List[str]] = ['neighbors'], directed: bool = False, use_weights: bool = False, save_model: Union[str, None] = None, copy: bool = False, dispatch_backend: Optional[str] = 'loky', random_seed: Optional[int] = None) -> [Union[List[anndata.AnnData]], mudata.MuData, None]

   Cluster cells into subgroups using multiple modalities.

   Cluster cells using the nested Stochastic Block Model [Peixoto14]_,
   performing Bayesian inference on node groups. This function takes multiple
   experiments, possibly across different modalities, and perform joint
   clustering.

   This requires having ran :func:`~scanpy.pp.neighbors` or
   :func:`~scanpy.external.pp.bbknn` first. It also requires cells having the same
   names if coming from paired experiments

   Parameters
   ----------
   mdata
       A list of processed AnnData. Neighbors must have been already
       calculated. If a MuData object is passed, a model on the layered graph
       will be fitted. If you want to fit a model on the shared graph representation, 
       e.g. WNN graph or a graph built on MOFA latent factors, you still can use
       the standard ``scs.inference.model()`` function.
   deg_corr
       Whether to use degree correction in the minimization step. In many
       real world networks this is the case, although this doesn't seem
       the case for KNN graphs used in scanpy.
   tolerance
       Tolerance for fast model convergence.
   n_sweep 
       Number of iterations to be performed in the fast model MCMC greedy approach
   beta
       Inverse temperature for MCMC greedy approach    
   n_init
       Number of initial minimizations to be performed. The one with smaller
       entropy is chosen
   refine_model
       Wether to perform a further mcmc step to refine the model
   refine_iter
       Number of refinement iterations.
   max_iter
       Maximum number of iterations during minimization, set to infinite to stop 
       minimization only on tolerance
   overlap
       Whether the different layers are dependent (overlap=True) or not (overlap=False)
   n_jobs
       Number of parallel computations used during model initialization
   key_added
       `adata.obs` key under which to add the cluster labels.
   adjacency
       Sparse adjacency matrix of the graph, defaults to
       `adata.uns['neighbors']['connectivities']` in case of scanpy<=1.4.6 or
       `adata.obsp[neighbors_key][connectivity_key]` for scanpy>1.4.6
   neighbors_key
       The key passed to `sc.pp.neighbors`. If all AnnData share the same key, one
       only has to be specified, otherwise the full tuple of all keys must 
       be provided
   directed
       Whether to treat the graph as directed or undirected.
   use_weights
       If `True`, edge weights from the graph are used in the computation
       (placing more emphasis on stronger edges). Note that this
       increases computation times
   save_model
       If provided, this will be the filename for the PartitionModeState to 
       be saved    
   copy
       Whether to copy `adata` or modify it inplace.
   random_seed
       Random number to be used as seed for graph-tool

   Returns
   -------
   `adata.obs[key_added]`
       Array of dim (number of cells) that stores the subgroup id
       (`'0'`, `'1'`, ...) for each cell. 
   `adata.uns['schist']['multi_level_params']`
       A dict with the values for the parameters `resolution`, `random_state`,
       and `n_iterations`.
   `adata.uns['schist']['multi_level_stats']`
       A dict with the values returned by mcmc_sweep
   `adata.obsm['CA_multi_nsbm_level_{n}']`
       A `np.ndarray` with cell probability of belonging to a specific group
   `adata.uns['schist']['multi_level_state']`
       The NestedBlockModel state object


.. py:function:: nested_model(adata: anndata.AnnData, deg_corr: bool = True, tolerance: float = 1e-06, n_sweep: int = 10, beta: float = np.inf, n_init: int = 100, collect_marginals: bool = True, n_jobs: int = -1, refine_model: bool = False, refine_iter: int = 100, max_iter: int = 100000, *, restrict_to: Optional[Tuple[str, Sequence[str]]] = None, random_seed: Optional[int] = None, key_added: str = 'nsbm', adjacency: Optional[scipy.sparse.spmatrix] = None, neighbors_key: Optional[str] = 'neighbors', directed: bool = False, use_weights: bool = False, save_model: Union[str, None] = None, copy: bool = False, dispatch_backend: Optional[str] = 'threads') -> Optional[anndata.AnnData]

   This function has been deprecated and it soon will be removed.
   It now wraps ``scs.inference.fit_model()`` function.


.. py:function:: flat_model(adata: anndata.AnnData, n_sweep: int = 10, beta: float = np.inf, tolerance: float = 1e-06, collect_marginals: bool = True, deg_corr: bool = True, n_init: int = 100, n_jobs: int = -1, refine_model: bool = False, refine_iter: int = 100, max_iter: int = 100000, *, restrict_to: Optional[Tuple[str, Sequence[str]]] = None, random_seed: Optional[int] = None, key_added: str = 'sbm', adjacency: Optional[scipy.sparse.spmatrix] = None, neighbors_key: Optional[str] = 'neighbors', directed: bool = False, use_weights: bool = False, save_model: Union[str, None] = None, copy: bool = False, dispatch_backend: Optional[str] = 'threads') -> Optional[anndata.AnnData]

   This function has been deprecated and it soon will be removed.
   It now wraps ``scs.inference.fit_model()`` function.


.. py:function:: nested_model_multi(adatas: List[anndata.AnnData], deg_corr: bool = True, tolerance: float = 1e-06, n_sweep: int = 10, beta: float = np.inf, n_init: int = 100, collect_marginals: bool = True, n_jobs: int = -1, refine_model: bool = False, refine_iter: int = 100, overlap: bool = False, max_iter: int = 100000, *, random_seed: Optional[int] = None, key_added: str = 'multi_nsbm', adjacency: Optional[List[scipy.sparse.spmatrix]] = None, neighbors_key: Optional[List[str]] = ['neighbors'], directed: bool = False, use_weights: bool = False, save_model: Union[str, None] = None, copy: bool = False, dispatch_backend: Optional[str] = 'threads') -> Optional[List[anndata.AnnData]]

   This function has been deprecated and it soon will be removed.
   It now wraps ``scs.inference.fit_model()`` function.