/
Creation Help

STOmicsDB Visualization Creation Guide

1. Introduction to Visualization Creation

STOmicsDB provides spatial omics data and metadata, including research areas, sample tissues, species, spatial resolutions, and publication types. The data includes gene expression matrices, spatial positions, tissue images, and analysis results like clustering and cell annotation.

The Dataset Creation system ensures data accuracy through quality control and performs analyses such as cell type annotation, spatial region identification, and cell-cell interaction. Researchers can explore results and visualize data to gain insights into spatial transcriptomics.

2. Database Access

3. Steps for Visualization Creation

3.1 Accessing the Visualization Creation Interface

  1. Visit https://db.cngb.org/stomics/.
  2. Click Submit | Create under the Submission module to access the creation interface.
  3. Click Create to start the process.
  4. Log in using WeChat, CARSI, ORCID, GitHub, BGI, etc.

3.2 Starting a New Visualization

  1. Click New Creation to begin.
  2. A unique creation number (e.g., stc0000xxx) will be generated, and you will enter the creation details interface.

3.3 Completing Creation Details

Step 1: Upload Data

  • Download and fill in the visualization creation template (see FAQ for details).
  • Upload the template. Fix errors based on messages or email the support team for unresolved issues.
  • Choose a data upload method (e.g., FTP or Aspera). Wait ~10 minutes for system detection.
  • Click Check Files and Next to verify data integrity.

Step 2: Provide Additional Information

  • Fill in details like submitter name, project name, etc.

3.4 Managing Visualizations

  • Creation List Interface: https://db.cngb.org/stportal/creations/
  • Unfinished Creations: Click Modify to continue or Delete to remove.
  • Completed Creations:
    • Click Apply Modification to request edit permissions. After approval, click Modify to edit.
    • Click Apply Deletion to request deletion. After approval, the creation will be deleted.

4. How to Cite Data

4.1 Example Dataset References

  • "The stereo-seq data used to generate Fig. 1 comes from the STOmicsDB database [1], query number STDS0000058 [2]."
  • "The spatial mouse kidney data have been deposited into STOmicsDB [1] (https://db.cngb.org/stomics/datasets/STDS0000058 [2])."

4.2 Citation Formats

  • Citing STOmicsDB:
    • Xu, Zhicheng et al. “STOmicsDB: a comprehensive database for spatial transcriptomics data sharing, analysis and visualization." Nucleic Acids Research, vol. 52, D1 (2024): D1053-D1061. doi:10.1093/nar/gkad933
  • Citing a Visualization Dataset (e.g., STDS0000058):
  • Citing Original Data Articles:
    • Chen, Ao et al. "Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays." Cell, vol. 185, 10 (2022): 1777-1792.e21. doi:10.1016/j.cell.2022.04.003

5. Visualization Creation - Dataset

5.1 Searching for Datasets

  • Use the general or advanced search modules to find datasets.

5.2 Metadata Display

  • Research Information: Includes species, tissues, diseases, developmental stages, publications, and technologies.
  • Metadata Organization: Includes file types, sizes, and structures. Standard and advanced analyses extract key information.

5.3 Data Standardization

STOmicsDB standardizes data from public databases and user submissions. The process includes:

  • Data Import: Use Scanpy (v1.8.1) to read data (e.g., expression matrices, spatial locations, tissue images) into AnnData objects.
  • Standardization: Remove duplicate genes, normalize counts, perform PCA, and cluster using the Leiden algorithm. Identify marker genes and spatially variable genes using Scanpy and spatialDE.
  • Output: Save results in h5ad format for user access.

Advanced Analysis Visualization

  • Cell Annotation: Use SCINA (v1.2.0) to annotate clusters with cell types based on marker genes.
  • Cell Interaction: Use stLearn (v0.4.12) to analyze and visualize significant cell-cell interactions.
  • Differential Analysis: Identify upregulated genes between cell types and perform enrichment analysis using ClusterProfiler.

5.4 Expression Profile Visualization

Displays gene expression maps, cell annotations, and marker gene distributions for intuitive spatial visualization.

5.5 Data Files and Downloads

STOmicsDB provides controlled and public sharing options:

  • Raw Data: Unprocessed experimental data or filtered matrices.
  • Processed Data: Results after standardization, including normalized matrices and cluster analyses.
  • Custom Data: User-submitted intermediate results or annotations.

5.6 Sample Modules

  • Sample List: View all samples and their main information.
  • Sample Details: Access detailed sample information and download data.

6. Frequently Asked Questions

6.1 Template Filling Requirements

To create a visualization, fill in dataset metadata, sample information, sample file information, and analysis file information. Optionally, add a homepage image for personalized dataset display.

Dataset Metadata

FIELDS DESCRIPTION EXAMPLE
TITLE Dataset title MOSTA: Mouse Organogenesis Spatiotemporal Transcriptomic Atlas
SPECIES Species name (Tax ID), separate multiple with ` `
TISSUES Biological tissue names, separate multiple with ` `
ORGAN PARTS Biological suborgan names, separate multiple with ` `
DEVELOPMENT STAGE Sample developmental period, separate multiple with ` `
SEX Sample gender, Male or Female, separate multiple with ` `
TECHNOLOGY Sequencing technology, separate multiple with ` `
SAMPLE NUMBER Sample size 16
SECTION NUMBER Total number of slices 61
DISEASE Diseases studied, separate multiple with ` `
SUMMARY Dataset overview (max 4000 characters) Overview of the dataset and its significance.
OVERALL DESIGN Experimental design (max 2000 characters) Description of the experimental setup.
SUBMISSION DATE Submission date in yyyy-mm-dd format 2020-01-24
UPDATE DATE Update date in yyyy-mm-dd format 2020-11-18
CONTRIBUTORS Contributors, separate multiple with ` `
CONTACT Contact information, separate multiple with ` `
CITATION Citations of related articles, separate multiple with ` `
ACCESSIONS Data source or storage location CNSA project: CNP0001543
RELATIONS Data associations with other databases Stomics submission: STT0000013
PLATFORM Sequencing platform, separate multiple with ` `

Sample Information

FIELDS DESCRIPTION EXAMPLE
SAMPLE_NAME Sample name E9.5_EXAMPLE
SECTION_NAME Slice namespace Omics technology: When the same sample includes multiple slices, multiple lines need to be filled in Other Omics technology: When no slice is included, the slice name must be the same as the sample name.
TECHNOLOGY Sequencing technologies, such as Stereo-Seq, 10X Visium, scRNA Stereo-Seq
SPECIES Tax ID of the species to which the sample belongs 10090
TISSUE Sample tissue type Embryo
ORGAN PARTS Name of biological subgroup Embryonic brain
DEVELOPMENT STAGE Sample development period E9.5
SAMPLE ID The sample number submitted to CNSA or spatiotemporal group database; leave it blank if none exists CNS0001619
SOURCE For those collected by other databases/institutions, fill in the corresponding information; for those submitted to the gene bank, fill in CNGB CNGB
SEX Sample gender, such as: Male | Female Male
PLATFORM Sequencing platform, such as DIPSEQ-T1 DNBSEQ-T1
DISEASE Disease of the sample, if none, normal squamous cell carcinoma

Expression File Information

FIELDS DESCRIPTION EXAMPLE
SECTION_NAME The sample name to which the user analysis file belongs E9.5_E2S1_EXAMPLE
DATA_TYPE Sequencing technology Stereo-Seq
FILE_NAME File name cluster_makers.svg
TITLE Analysis Type Cluster markers
DESCRIPTION Analysis Description The marker genes of each cluster were calculated by scanpy.tl.rank_genes_groups with the "wilcoxon" method. If the original annotation information of dataset is available, we use the original one, if not, we get the annotation information through scanpy.tl.leiden.
MD5 File md5 value 7abcfa8f3abd503e286badf040ba4fa3

Analyze File Information

FIELD NAME CONTENT FORMAT DESCRIPTION
SECTION_NAME The sample name to which the user analysis file belongs E9.5_E2S1_example
DATA_TYPE Sequencing technology Stereo-Seq
FILE_NAME File name cluster_makers.svg
TITLE Analysis Type Cluster markers
DESCRIPTION Analysis Description The marker genes of each cluster were calculated by scanpy.tl.rank_genes_groups with the "wilcoxon" method. If the original annotation information of dataset is available, we use the original one, if not, we get the annotation information through scanpy.tl.leiden.
MD5 File md5 value 7abcfa8f3abd503e286badf040ba4fa3

Optional Information for the Dataset

  • Display thumbnails: You can add pictures to Excel spreadsheets with unlimited resolution

6.2 Verification Information Explanation

To be continued. If you encounter any related problems, please contact us.

Contact

For assistance, contact P_STOmicsDB@genomics.cn.
We provide detailed guidance and support to ensure a smooth process for data submission and visualization creation.

Support Hours: Monday to Friday, 9:00 AM - 6:00 PM (GMT+8).