/
Help

STOmicsDB


Spatial TranscriptOmics DataBase (STOmicsDB) is a comprehensive portal that integrates spatiotemporal omics literature, tools, and data. STOmicsDB consists of the following sections.

Resource center


The resource center provides following functions:

  • Efficiently spatial-omics-related literature search. For example, users can quickly obtain the literature of interest through multiple dimensions such as research field, spatial resolution, time series, and species.
  • Obtain the latest spatial-omics-related tool.
  • Explore and visualize published spatial transcriptomic data, obtain corresponding spatial genes, and download data to further analysis.

The spatial multi-omics publications and tools were retrieved from NCBI PubMed and PubMed Central with spatial multi-omics related terms, and then the related literature and tools were further selected and classified by the supervised machine learning method. The spatial transcriptomic datasets were curated from the NCBI GEO, EMBL-EBI ArrayExpress, 10x genomics (https://support.10xgenomics.com/spatial-gene-expression/datasets/) and the SPATIAL research (https://www.spatialresearch.org/resources-published-datasets/).

Datasets

  1. Data pre-processing and analysis
    In brief, we used Scanpy (version 1.8.1) to analyze curated datasets with default parameters. In the beginning, we normalized and logarithmized the downloaded gene expression data, and then we conducted principal component analysis (PCA) with the top 2000 highly variable genes to reduce the dimensionality of data. Next, we calculated the neighborhood map with PCA results. Uniform Manifold Approximation and Projection (UMAP) analysis and cluster spots were performed with the Leiden algorithm. For each dataset, we annotated cluster-specific marker genes with Wilcoxon rank-sum test by Scanpy. If the data contains spatial coordinate information, we then identified spatially variable genes with spatialDE (version 1.1.3). All data and the corresponding analysis result can be downloaded in the Data tab and Analysis results tab on the top panel.

  2. Data visualization
    STOmicsDB uses Cirrocumulus (https://cirrocumulus.readthedocs.io/en/latest/) for dataset visualization. Cirrocumulus is an interactive visualization tool for large-scale single-cell and spatial transcriptomic data. The data visualization page consists of five parts: a section selector, a top toolbar, a sidebar, a main canvas, and a gallery.

    • The section selector on the top allows users to select a specific section in the dataset.
    • The top toolbar displays the number of cells and the number of selected cells in the dataset on the left. The right part has three buttons: EMBEDDINGS, DISTRIBUTIONS, and a moon symbol. The EMBEDDINGS button shows the default canvas interface. The DISTRIBUTIONS option allows users to explore the differential gene expression across cell clusters with a dot plot, a heat map, or a violin plot. The moon symbol button is the dark model option.
    • The left sidebar allows you to select genes/traits to be visualized, select clustering tags, and run differential expression analysis.
    • The main canvas shows an interactive 2d or 3D graphic, which can be panned, zoomed, and selected in specific regions using the mouse.
    • The gallery, which locates in the bottom, shows thumbnails of selected genes/gene sets. Users can click the thumbnail to display it on the main canvas.

    For more details, please visit: https://cirrocumulus.readthedocs.io/en/latest/documentation.html

  3. Dataset analysis results
    The Analysis results tab on the top panel shows the general statistics and cluster/spatial markers information.

Specialized databases


STOmicsDB has a customized database service. We welcome researchers to construct spatial transcriptomics databases with us, or deploy their specialized databases on STOmicsDB.
Now, we have constructed three such databases with other researchers: ATRISTA (axolotl brain regeneration, https://db.cngb.org/stomics/artista/), MOSTA (mouse organogenesis, https://db.cngb.org/stomics/mosta/), and MLRSTA (mouse liver regeneration, https://db.cngb.org/stomics/mlrsta/)

Submission


Compare with other techniques, spatial transcriptomic techniques have some critical features, such as their spatial information. Additionally, different spatial transcriptomic techniques also have their own features. To facilitate the spatial transcriptomic data reuse and re-analysis, we aim to develop a spatial transcriptomic data submission standard for each technique.
In the current version, STOmicsDB supports the data submission of two techniques: Stereo-seq and 10x Visium. The template for submission can be downloaded here (https://ftp.cngb.org/pub/stomics)
The data model is as follows:

Analysis


To facilitate spatial transcriptomic data usage, we set up an online tool based on SingleR to provide an interaction analysis between spatial transcriptomic data and single-cell RNA sequencing data. This tool allows users to annotate cell types of a specific spatial transcriptomic dataset on STOmicsDB by uploading their single-cell RNA sequencing gene expression matrix and the corresponding cell types. Except for the default outputs of SingleR, this tool also generates a spatial feature plot to show the spatial localization of each annotated cell type.

Contact


If you need any help, please contact [email protected].

How to cite


Xu, Z., Wang, W., Yang, T., Chen, J., Huang, Y., Gould, J., Du, W., Yang, F., Li, L., Lai, T. et al. (2022) STOmicsDB: a database of Spatial Transcriptomic data. bioRxiv, 2022.2003.2011.481421.