Spatial TranscriptOmics DataBase (STOmicsDB) is a comprehensive portal that integrates spatiotemporal omics literature, tools, and data. STOmicsDB consists of the following sections.
The resource center provides following functions:
The spatial multi-omics publications and tools were retrieved from NCBI PubMed and PubMed Central with spatial multi-omics related terms, and then the related literature and tools were further selected and classified by the supervised machine learning method. The spatial transcriptomic datasets were curated from the NCBI GEO, EMBL-EBI ArrayExpress, 10x genomics (https://support.10xgenomics.com/spatial-gene-expression/datasets/) and the SPATIAL research (https://www.spatialresearch.org/resources-published-datasets/).
Data pre-processing and analysis
In brief, we used Scanpy (version 1.8.1) to analyze curated datasets with default parameters. In the beginning, we normalized and logarithmized the downloaded gene expression data, and then we conducted principal component analysis (PCA) with the top 2000 highly variable genes to reduce the dimensionality of data. Next, we calculated the neighborhood map with PCA results. Uniform Manifold Approximation and Projection (UMAP) analysis and cluster spots were performed with the Leiden algorithm. For each dataset, we annotated cluster-specific marker genes with Wilcoxon rank-sum test by Scanpy. If the data contains spatial coordinate information, we then identified spatially variable genes with spatialDE (version 1.1.3). All data and the corresponding analysis result can be downloaded in the Data tab and Analysis results tab on the top panel.
Data visualization
STOmicsDB uses Cirrocumulus (https://cirrocumulus.readthedocs.io/en/latest/) for dataset visualization. Cirrocumulus is an interactive visualization tool for large-scale single-cell and spatial transcriptomic data. The data visualization page consists of five parts: a section selector, a top toolbar, a sidebar, a main canvas, and a gallery.
For more details, please visit: https://cirrocumulus.readthedocs.io/en/latest/documentation.html
Dataset analysis results
The Analysis results tab on the top panel shows the general statistics and cluster/spatial markers information.
STOmicsDB has a customized database service. We welcome researchers to construct spatial transcriptomics databases with us, or deploy their specialized databases on STOmicsDB.
Now, we have constructed three such databases with other researchers: ATRISTA (axolotl brain regeneration, https://db.cngb.org/stomics/artista/), MOSTA (mouse organogenesis, https://db.cngb.org/stomics/mosta/), and MLRSTA (mouse liver regeneration, https://db.cngb.org/stomics/mlrsta/)
Compare with other techniques, spatial transcriptomic techniques have some critical features, such as their spatial information. Additionally, different spatial transcriptomic techniques also have their own features. To facilitate the spatial transcriptomic data reuse and re-analysis, we aim to develop a spatial transcriptomic data submission standard for each technique.
In the current version, STOmicsDB supports the data submission of two techniques: Stereo-seq and 10x Visium. The template for submission can be downloaded here (https://ftp.cngb.org/pub/stomics)
The data model is as follows:
To facilitate spatial transcriptomic data usage, we set up an online tool based on SingleR to provide an interaction analysis between spatial transcriptomic data and single-cell RNA sequencing data. This tool allows users to annotate cell types of a specific spatial transcriptomic dataset on STOmicsDB by uploading their single-cell RNA sequencing gene expression matrix and the corresponding cell types. Except for the default outputs of SingleR, this tool also generates a spatial feature plot to show the spatial localization of each annotated cell type.
If you need any help, please contact [email protected].
Xu, Z., Wang, W., Yang, T., Chen, J., Huang, Y., Gould, J., Du, W., Yang, F., Li, L., Lai, T. et al. (2022) STOmicsDB: a database of Spatial Transcriptomic data. bioRxiv, 2022.2003.2011.481421.