.. _spatial_rna: =========================================== Analysis of spatially resolved RNA-seq data =========================================== Similarly to what shown `analyzing ATAC data <../Spatial_ATAC/spatial_atac.html>`_, we retrieved fastq from the paper "**Spatial epigenome–transcriptome co-profiling of mammalian tissues**" (Zhang et al, 2023, DOI:10.1038/s41586-023-05795-1), this time RNA sequences (SRA run ID: SRR22561636) and processed using ``kb_python``. The resulting count matrix file is imported and processed as standard scRNA-seq data .. code:: python import numpy as np import pandas as pd import matplotlib import matplotlib.pyplot as plt from matplotlib.pyplot import * import anndata as ad import scanpy as sc import scipy.sparse as ssp import scipy.stats as sst import spatialdata_io import spatialdata_plot import spatialdata as sd import squidpy as sq import schist as scs import warnings warnings.filterwarnings('ignore') #import magic from tqdm import tqdm def set_res(high=True): dpi=80 if high: dpi=150 sc.set_figure_params(dpi=dpi, fontsize=6) rcParams['axes.grid'] = False set_res(False) First we import the ``.h5ad`` file computed by ``kb_python``. After that we perform some QC and filtering, as proposed by sc-best-practices. .. code:: python adata = sc.read_h5ad("SRR22561636/counts_unfiltered/adata.h5ad") adata.var["mt"] = adata.var_names.str.startswith("mt-") adata.var["ribo"] = adata.var_names.str.startswith(("Rps", "Rpl")) sc.pp.calculate_qc_metrics( adata, qc_vars=["mt", "ribo"], inplace=True, percent_top=[20], log1p=True ) def is_outlier(adata, metric: str, nmads: int): M = adata.obs[metric] outlier = (M < np.median(M) - nmads * sst.median_abs_deviation(M)) | ( np.median(M) + nmads * sst.median_abs_deviation(M) < M ) return outlier adata.obs["outlier"] = ( is_outlier(adata, "log1p_total_counts", 5) | is_outlier(adata, "log1p_n_genes_by_counts", 5) | is_outlier(adata, "pct_counts_in_top_20_genes", 5) ) adata.obs["mt_outlier"] = is_outlier(adata, "pct_counts_mt", 3) | ( adata.obs["pct_counts_mt"] > 8 ) In the end we retain most of the pixels .. code:: python print(f"Total number of cells: {adata.n_obs}") adata = adata[(~adata.obs.outlier) & (~adata.obs.mt_outlier)].copy() print(f"Number of cells after filtering of low quality cells: {adata.n_obs}") .. code:: parsed-literal Total number of cells: 2500 Number of cells after filtering of low quality cells: 2157 We select highly variable genes and then perform normalization (PFlog1pPF). Lastly, after PCA is computed, we save the ``anndata`` to build later the ``spatialdata`` object. .. code:: python sc.pp.highly_variable_genes(adata, flavor='seurat_v3_paper') pf = adata.X.sum(axis=1).A.ravel() l1pf = np.log1p(ssp.diags(pf.mean()/pf)@adata.X) pf = l1pf.sum(axis=1).A.ravel() adata.X = ssp.diags(pf.mean()/pf)@l1pf sc.tl.pca(adata, use_highly_variable=True) adata.write("analysis_rna/rna.h5ad") Here the ``anndata`` is imported using the DBiT-seq plugin for ``spatialdata``. Note that we had to rotate the original image from the paper 90 degrees CCW, as the current version of the plugin orders the barcodes differently from what is displayed in the original paper. .. code:: python spdata = spatialdata_io.readers.dbit.dbit(path='analysis_rna', anndata_path='analysis_rna/rna.h5ad', barcode_position='barcodes.txt', image_path='ME13_50um_spatial/tissue_hires_image.png', dataset_id='ME13_50um_spatial') We can visualize a QC value (number of genes) in its context .. code:: python set_res(True) spdata.pl.render_images().pl.render_shapes(color='n_genes_by_counts', fill_alpha=1).pl.show(title='n_genes_by_counts') plt.xlabel('A barcodes') plt.ylabel('B barcodes') .. image:: output_14_1.png We create the kNN graph using the PCA embedding, then we apply ``schist`` to find the hierarchical cell structure .. code:: python n_neighbors = 15 sc.pp.neighbors(spdata.table, metric='cosine', key_added='pca_neighbors', n_neighbors=n_neighbors, n_pcs=20) set_res(False) sc.tl.umap(spdata.table, neighbors_key='pca_neighbors')#, min_dist=0.1) sc.pl.umap(spdata.table, color=['n_genes_by_counts']) .. image:: output_17_0.png .. code:: python basis='pca' sc.settings.verbosity=2 scs.inference.fit_model(spdata.table, neighbors_key=f'{basis}_neighbors') sc.settings.verbosity=0 .. code:: parsed-literal minimizing the nested Stochastic Block Model minimization step done (0:01:14) consensus step done (0:01:25) done (0:01:25) finished (0:01:25) .. code:: python sc.pl.umap(spdata.table, color=['nsbm_level_1', 'nsbm_level_2']) .. image:: output_19_0.png Finally we can check how clusters distribute in their spatial context. At level 1 we can distinguish certain clusters that belong to the neual tissues. Here the results appear to be in line with the publication .. code:: python set_res(True) spdata.pl.render_images().pl.render_shapes(color='nsbm_level_1', fill_alpha=.7).pl.show(title='Level 1', colorbar=True) xticks([]) yticks([]) plt.xlabel('A barcodes') plt.ylabel('B barcodes') .. image:: output_21_1.png Similarly to what has been done for ATAC, we perform the analysis of a multimodal data, where one modality is represented by the spatial graph .. code:: python sq.gr.spatial_neighbors(spdata.table, n_neighs=8, coord_type='grid') _tmp = spdata.table.copy() sc.settings.verbosity=2 scs.inference.fit_model_multi([spdata.table, _tmp], key_added='spt', neighbors_key=['pca_neighbors', 'spatial_neighbors']) sc.settings.verbosity=0 .. code:: parsed-literal minimizing the nested Stochastic Block Model getting adjacency for data 0 (0:00:00) getting adjacency for data 1 (0:00:00) minimization step done (0:04:11) consensus step done (0:04:21) done (0:04:21) finished (0:04:22) Again, we visualize the structured data, with a better resolution of anatomical structures .. code:: python set_res(True) spdata.pl.render_images().pl.render_shapes(color='spt_level_1', fill_alpha=.7).pl.show(title='SPT_level_1', colorbar=True) xticks([]) yticks([]) plt.xlabel('A barcodes') plt.ylabel('B barcodes') .. image:: output_26_1.png Again, we save data for later use in `integrated analysis <../Spatial_Multi/spatial_multi.html>`_. .. code:: python spdata.write('analysis/SRR22561636.zarr')