.. _spatial_atac:

============================================
Analysis of spatially resolved ATAC-seq data
============================================

Fastq from the paper "**Spatial epigenome–transcriptome co-profiling of mammalian tissues**" (Zhang et al, 2023, DOI:10.1038/s41586-023-05795-1) were downloaded (SRA run ID: SRR22565186) and processed using ``chromap``. The resulting fragment file is imported using ``snapatac2``. Here's just a plot showing the general performance of the spatial ATAC-seq data.
First of all, we import all required libraries.

.. code:: python

	import numpy as np
	import pandas as pd
	import matplotlib
	import matplotlib.pyplot as plt
	from matplotlib.pyplot import *
	
	import anndata as ad
	import scanpy as sc
	import snapatac2 as snap
	
	import scipy.sparse as ssp
	import scipy.stats as sst
	
	import spatialdata_io
	import spatialdata_plot
	import spatialdata as sd
	
	import squidpy as sq
	import schist as scs
	import warnings
	warnings.filterwarnings('ignore')
	#import magic
	from tqdm import tqdm
	
	def set_res(high=True):
		dpi=80
		if high:
			dpi=150
		sc.set_figure_params(dpi=dpi, fontsize=6)
		rcParams['axes.grid'] = False
	
	set_res(False)


Parsing of fragment is performed using standard parameters. THe whitelist here is the list of all 2500 barcodes allowed by the DBiT-seq circuit.

.. code:: python

	adata = snap.pp.import_data('SRR22565186.bed.gz', 
	                             sorted_by_barcode=False,
								 whitelist='whitelist.txt', backend=None,
								 chunk_size=20000,
								 min_num_fragments=0, 
								 chrom_sizes=snap.genome.mm10)
	snap.pl.frag_size_distr(adata, interactive=False)
	snap.metrics.frip(adata, {"peaks_frac": snap.datasets.cre_HEA()})
	snap.metrics.tsse(adata, snap.genome.mm10)
	snap.pl.tsse(adata, interactive=False, )


.. image:: output_3_1.png
    

We bin genome at 1000 bp, apply some filters and apply ``snapatac``'s fast dimensionality reduction scheme after selecting variable features. We save the `.h5ad` file that is used to build a ``spatialdata`` object


.. code:: python
	
	snap.pp.add_tile_matrix(adata, counting_strategy='paired-insertion', bin_size=1000)
	adata.obs['log_fragment'] = np.log10(adata.obs['n_fragment'])
	cells = adata.obs.query('n_fragment >= 5000 and tsse >= 3 and n_fragment <=100000').index
	adata = adata[cells].copy()
	#select the top 50k regions
	snap.pp.select_features(adata, 
                            filter_upper_quantile=0.,
                            n_features=50000,
                            max_iter=1, blacklist='mm10-blacklist.v2.bed.gz')
	snap.tl.spectral(adata)
	adata.write("analysis/atac_raw.h5ad")

Here the ``anndata`` is imported using the DBiT-seq plugin for ``spatialdata``. Note that we had to rotate the original image from the paper 90 degrees CCW, as the current version of the plugin orders the barcodes differently from what is displayed in the original paper.


.. code:: python

	spdata = spatialdata_io.readers.dbit.dbit(path='analysis',
	                                        anndata_path='analysis/atac_raw.h5ad', 
	                                        barcode_position='barcodes.txt',
	                                        image_path='ME13_50um_spatial/tissue_hires_image.png',
	                                        dataset_id='ME13_50um_spatial'
                                     )


The TSS enrichment is visualized in its spatial context. The eye and the forebrain regions have the highest TSSE values.


.. code:: python
	set_res(True)
	spdata.pl.render_images().pl.render_shapes(color='tsse', fill_alpha=1).pl.show(title='TSSE')
	xticks([])
	yticks([])
	plt.xlabel('A barcodes')
	plt.ylabel('B barcodes')

    
.. image:: output_15_1.png
    

The coverage is unevenly distributed. We here notice an issue common in DBiT-seq, that is some channels _stripe_ the tissue because of differences in the flows. 


.. code:: python
	spdata.pl.render_images().pl.render_shapes(color='log_fragment', fill_alpha=0.4, cmap='RdYlBu_r').pl.show('global', title='coverage', colorbar=True)
	xticks([])
	yticks([])
	plt.xlabel('A barcodes')
	plt.ylabel('B barcodes')

    
.. image:: output_17_1.png

We here create the kNN graph, according to the spectral reduction performed above. A UMAP plot is also presented (not that we are going to use it, anyway).

.. code::python 

	n_neighbors = 15
	sc.pp.neighbors(spdata.table, use_rep='X_spectral', 
					metric='cosine',
					key_added='spectral_neighbors',
					n_neighbors=n_neighbors)#, n_pcs=10)
	set_res(False)
	sc.tl.umap(spdata.table, neighbors_key='spectral_neighbors')#, min_dist=0.1)
	sc.pl.umap(spdata.table, color=['log_fragment', 'tsse'])


.. image:: output_20_1.png

Here a simple hierarchical model is performed using ``schist`` on the kNN graph above. The clusters at level 1 and 2 are shown in UMAP.


.. code:: python

	basis='spectral'
	sc.pp.neighbors(spdata.table, key_added=f'{basis}_neighbors', use_rep=f'X_{basis}')
	sc.settings.verbosity=2
	scs.inference.fit_model(spdata.table, neighbors_key=f'{basis}_neighbors')
	sc.settings.verbosity=0

.. code:: parsed-literal

    minimizing the nested Stochastic Block Model
            minimization step done (0:01:28)
            consensus step done (0:01:46)
        done (0:01:46)
        finished (0:01:46)


.. code:: python

	sc.pl.umap(spdata.table, color=['nsbm_level_1', 'nsbm_level_2'])

.. image:: output_23_0.png


Finally we can check how clusters distribute in their spatial context. At level 1 we can distinguish certain clusters that belong to the neual tissues.

.. code:: python

	set_res(True)
	spdata.pl.render_images().pl.render_shapes(color='nsbm_level_1', fill_alpha=.7).pl.show(title='Level 1', colorbar=True)
	xticks([])
	yticks([])
	plt.xlabel('A barcodes')
	plt.ylabel('B barcodes')

.. image:: output_25_1.png

The neural clusters are also the ones with highest self-enrichment in their neighborhood.


.. code:: python

	sq.gr.spatial_neighbors(spdata.table, n_neighs=8, coord_type='grid')
	sq.gr.nhood_enrichment(spdata.table, cluster_key="nsbm_level_1")
	set_res(False)
	sq.pl.nhood_enrichment(spdata.table, cluster_key="nsbm_level_1")

.. image:: output_27_1.png

Can we perform better than this? One idea would be to integrate the spatial information into the model. Spatial data, especially for grid-like data as in DBiT-seq, do not have a community structure, yet they can be used to enforce local communities. ``schist`` does not yet support spatial graphs with a specific function, however we can create an object with a spatial graph (copying the original) and apply the multimodal nested solution, specifying the two graphs needed.


.. code:: python

	_tmp = spdata.table.copy()

	sc.settings.verbosity=2
	scs.inference.fit_model_multi([spdata.table, _tmp], 
	                              key_added='spt', 
	                              neighbors_key=['spectral_neighbors', 'spatial_neighbors'])
	sc.settings.verbosity=0

.. code:: parsed-literal

    minimizing the nested Stochastic Block Model
    getting adjacency for data 0 (0:00:00)
    getting adjacency for data 1 (0:00:00)
            minimization step done (0:05:39)
            consensus step done (0:05:50)
        done (0:05:50)
        finished (0:05:50)


The process takes more time than above, but the results are clearly matching the anatomycal structures. We now see clusters 9 and 13 marking two structures that were not visibile in the original publication, if not with RNA-seq


.. code:: python
	
	set_res(True)
	spdata.pl.render_images().pl.render_shapes(color='spt_level_1', fill_alpha=.7).pl.show(title='SPT level 1', colorbar=True)
	xticks([])
	yticks([])
	plt.xlabel('A barcodes')
	plt.ylabel('B barcodes')

.. image:: output_34_1.png

We save the results for later analysis

.. code:: python

	spdata.write('analysis/SRR22565186.zarr')