/
Help

STOmicsDB


Spatial TranscriptOmics DataBase (STOmicsDB) is a comprehensive database serving as a one-stop service in the spatial transcriptomics field, including data archiving, sharing, visualization, and analysis.

Resource center


The resource center provides following functions:

  • Publications: Efficiently spatial-omics-related literature search. The citation for each publication is sourced from the Crossref website and is updated monthly.
  • Projects: An overall description of archiving spatial transcriptomic projects.
  • Samples: Detailed sample information in each curated datasets, such as species information, tissue details, corresponding diseases and so on.
    STOmicsDB offers two user-friendly search methods in the resource center: a quick search and an advanced search.The quick search is accessible through a search box on the homepage, allowing users to select all or specific resources such as publications, datasets, samples and projects using the drop-down list. The advanced search can be easily accessed by clicking the ‘Resources’ button in the top navigation bar. On the ‘Resources’ page, filter conditions are provided in the left sidebar listing section attributes. For instance, on the dataset page, users can refine their selection based on criteria like species, technology, organization time.

Datasets


The spatial transcriptomic datasets were curated from the NCBI GEO, EMBL-EBI ArrayExpress, STOmicsDB archiving system, and other sources.

1. Data pre-processing and analysis
In brief, we used Scanpy (version 1.8.1) to analyze curated datasets with default parameters. In the beginning, we normalized and logarithmized the downloaded gene expression data, and then we conducted principal component analysis (PCA) with the top 2000 highly variable genes to reduce the dimensionality of data. Next, we calculated the neighborhood map with PCA results. Uniform Manifold Approximation and Projection (UMAP) analysis and cluster spots were performed with the Leiden algorithm. For each dataset, we annotated cluster-specific marker genes with Wilcoxon rank-sum test by Scanpy. If the data contains spatial coordinate information, we then identified spatially variable genes with spatialDE (version 1.1.3). All data and the corresponding analysis result can be downloaded in the Data tab and Analysis results tab on the top panel.

2. Data visualization

STOmicsDB uses Cirrocumulus (https://cirrocumulus.readthedocs.io/en/latest/) for dataset visualization. Cirrocumulus is an interactive visualization tool for large-scale single-cell and spatial transcriptomic data. The data visualization page consists of five parts: a section selector, a top toolbar, a sidebar, a main canvas, and a gallery.

  • The section selector on the top allows users to select a specific section in the dataset.

  • The top toolbar displays the number of cells in the dataset on the left. The right part of top toolbar has five buttons: CLUSTERING, HEATMAP, DOT PLOTS, VIOLIN, and a moon symbol. The CLUSTERING button shows the default canvas interface. The HEATMAP, DOT PLOTS, and VIOLIN options allow users to explore the differential gene expression across cell clusters using various plots. The moon symbol button is the dark model option.

  • The left sidebar allows you to select genes/traits to be visualized, choose different clustering tags, and perform differential gene expression analysis.

  • The main canvas shows an interactive 2d or 3D graphic, which can be panned, zoomed, and selected in specific regions using the mouse.

  • The gallery, which locates in the bottom, shows thumbnails of selected genes/gene sets. Users can click the thumbnail to display it on the main canvas.

3. Dataset analysis results

The Analysis results tab on navigation panel on each dataset page shows the general statistics, cluster/spatial markers information, differential expression analysis based on marker genes, cell-cell interactions, spatially specific modules (Hotspot results), and the spatial marker genes.

4. Download

All datasets are stored in a unified format in the AnnData format(.h5ad) after curation. There are two .h5ad file suffixes: *.h5ad and *_processed.h5ad.

*.h5ad: AnnData format of original data, without any additional modifications by STOmicsDB.

*processed.h5ad: AnnData format of original data, without any additional modifications by STOmicsDB.

  • AnnData.layer:

    • layer['raw_count']: raw matrices.
  • AnnData.obs: one-dimensional observations annotation associated with each spot.

    • obs['clusters']: clustering results.

    • obs['cell_type']: cell type annotation results.

  • AnnData.obsm: multi-dimensional observations annotation associated with each spot.

    • obsm['X_umap']: UMPA result.

    • obsm['X_pca']: PCA dimensionality reduction result.

    • obsm['spatial']: Spatial spot infomartion, if the spatial coordinates are available.

Collections (customized databases)


STOmicsDB offers a customized database service for spatial transcriptomics. Researchers can collaborate with STOmicsDB to create spatial transcriptomics databases or deploy their own specialized databases on STOmicsDB.

Examples for collaborated spatial transcriptomics databases:

Submission


Compared to other techniques, spatial transcriptomic technology possess critical features, notably their ability to provide spatial information. Moreover, different spatial transcriptomic technologies exhibit unique characteristics of their own. To facilitate the reuse and re-analysis of spatial transcriptomic data, our objective is to establish a data submission standard for each technology.

In the current version, STOmicsDB supports data submission for two technologies: Stereo-seq and 10x Visium. For more details,please visit https://db.cngb.org/stomics/submission.

Analysis


To facilitate spatial transcriptomic data usage, STOmicsDB set up several online analysis tools:

  • SingleR: The SingleR was set up to provide an interactive analysis between user scRNA-seq data and spatial transcriptomic data that STOmicsDB curated. With the help of curated datasets, users can annotate the cell types and obtain spatial information based on their own data.
  • Gene search: Browse the gene expression pattern among all curated datasets and sections.
  • Compare: Provides an interactive interface to compare the gene expression pattern or cluster information between two sections or datasets
  • Stereomap: Rapid data retrieval and visualization of Stereo-seq datasets.

Contact


If you have any questions or suggestions, please don't hesitate to contact us at CNGBdb@cngb.org.

How to cite


Xu, Z., Wang, W., Yang, T., Li, L., Ma, X., Chen, J., Wang, J., Huang, Y., Gould, J., Lu, H., et al. (2023). STOmicsDB: a comprehensive database for spatial transcriptomics data sharing, analysis and visualization. Nucleic Acids Research. 10.1093/nar/gkad933.