Spatial TranscriptOmics DataBase (STOmicsDB) is a comprehensive database serving as a one-stop service in the spatial transcriptomics field, including data archiving, sharing, visualization, and analysis.
The resource center provides following functions:
The spatial transcriptomic datasets were curated from the NCBI GEO, EMBL-EBI ArrayExpress, STOmicsDB archiving system, and other sources.
1. Data pre-processing and analysis
In brief, we used Scanpy (version 1.8.1) to analyze curated datasets with default parameters. In the beginning, we normalized and logarithmized the downloaded gene expression data, and then we conducted principal component analysis (PCA) with the top 2000 highly variable genes to reduce the dimensionality of data. Next, we calculated the neighborhood map with PCA results. Uniform Manifold Approximation and Projection (UMAP) analysis and cluster spots were performed with the Leiden algorithm. For each dataset, we annotated cluster-specific marker genes with Wilcoxon rank-sum test by Scanpy. If the data contains spatial coordinate information, we then identified spatially variable genes with spatialDE (version 1.1.3). All data and the corresponding analysis result can be downloaded in the Data tab and Analysis results tab on the top panel.
2. Data visualization
STOmicsDB uses Cirrocumulus (https://cirrocumulus.readthedocs.io/en/latest/) for dataset visualization. Cirrocumulus is an interactive visualization tool for large-scale single-cell and spatial transcriptomic data. The data visualization page consists of five parts: a section selector, a top toolbar, a sidebar, a main canvas, and a gallery.
The section selector on the top allows users to select a specific section in the dataset.
The top toolbar displays the number of cells in the dataset on the left. The right part of top toolbar has five buttons: CLUSTERING, HEATMAP, DOT PLOTS, VIOLIN, and a moon symbol. The CLUSTERING button shows the default canvas interface. The HEATMAP, DOT PLOTS, and VIOLIN options allow users to explore the differential gene expression across cell clusters using various plots. The moon symbol button is the dark model option.
The left sidebar allows you to select genes/traits to be visualized, choose different clustering tags, and perform differential gene expression analysis.
The main canvas shows an interactive 2d or 3D graphic, which can be panned, zoomed, and selected in specific regions using the mouse.
The gallery, which locates in the bottom, shows thumbnails of selected genes/gene sets. Users can click the thumbnail to display it on the main canvas.
3. Dataset analysis results
The Analysis results tab on navigation panel on each dataset page shows the general statistics, cluster/spatial markers information, differential expression analysis based on marker genes, cell-cell interactions, spatially specific modules (Hotspot results), and the spatial marker genes.
4. Download
All datasets are stored in a unified format in the AnnData format(.h5ad
) after curation. There are two .h5ad
file suffixes: *.h5ad
and *_processed.h5ad
.
*.h5ad
: AnnData format of original data, without any additional modifications by STOmicsDB.
*processed.h5ad
: AnnData format of original data, without any additional modifications by STOmicsDB.
AnnData.layer:
layer['raw_count']
: raw matrices.AnnData.obs: one-dimensional observations annotation associated with each spot.
obs['clusters']
: clustering results.
obs['cell_type']
: cell type annotation results.
AnnData.obsm: multi-dimensional observations annotation associated with each spot.
obsm['X_umap']
: UMPA result.
obsm['X_pca']
: PCA dimensionality reduction result.
obsm['spatial']
: Spatial spot infomartion, if the spatial coordinates are available.
STOmicsDB offers a customized database service for spatial transcriptomics. Researchers can collaborate with STOmicsDB to create spatial transcriptomics databases or deploy their own specialized databases on STOmicsDB.
Examples for collaborated spatial transcriptomics databases:
Compared to other techniques, spatial transcriptomic technology possess critical features, notably their ability to provide spatial information. Moreover, different spatial transcriptomic technologies exhibit unique characteristics of their own. To facilitate the reuse and re-analysis of spatial transcriptomic data, our objective is to establish a data submission standard for each technology.
In the current version, STOmicsDB supports data submission for two technologies: Stereo-seq and 10x Visium. For more details,please visit https://db.cngb.org/stomics/submission.
To facilitate spatial transcriptomic data usage, STOmicsDB set up several online analysis tools:
If you have any questions or suggestions, please don't hesitate to contact us at CNGBdb@cngb.org.
Xu, Z., Wang, W., Yang, T., Li, L., Ma, X., Chen, J., Wang, J., Huang, Y., Gould, J., Lu, H., et al. (2023). STOmicsDB: a comprehensive database for spatial transcriptomics data sharing, analysis and visualization. Nucleic Acids Research. 10.1093/nar/gkad933.