Creation Help

STOmicsDB Visualization Creation Guide

1. Introduction to Visualization Creation

STOmicsDB provides spatial omics data and metadata, including research areas, sample tissues, species, spatial resolutions, and publication types. The data includes gene expression matrices, spatial positions, tissue images, and analysis results like clustering and cell annotation.

The Dataset Creation system ensures data accuracy through quality control and performs analyses such as cell type annotation, spatial region identification, and cell-cell interaction. Researchers can explore results and visualize data to gain insights into spatial transcriptomics.

2. Database Access

Home page: https://db.cngb.org/stomics/
Visualization creation: https://db.cngb.org/stomics/submission/data_visualization
Browse datasets: https://db.cngb.org/stomics/datasets/

3. Steps for Visualization Creation

3.1 Accessing the Visualization Creation Interface

Visit https://db.cngb.org/stomics/.
Click Submit | Create under the Submission module to access the creation interface.
Click Create to start the process.
Log in using WeChat, CARSI, ORCID, GitHub, BGI, etc.

3.2 Starting a New Visualization

Click New Creation to begin.
A unique creation number (e.g., stc0000xxx) will be generated, and you will enter the creation details interface.

3.3 Completing Creation Details

Step 1: Upload Data

Download and fill in the visualization creation template (see FAQ for details).
Upload the template. Fix errors based on messages or email the support team for unresolved issues.
Choose a data upload method (e.g., FTP). Wait ~10 minutes for system detection.
Click Check Files and Next to verify data integrity.

Step 2: Provide Additional Information

Fill in details like submitter name, project name, etc.

3.4 Managing Visualizations

Creation List Interface: https://db.cngb.org/stportal/creations/
Unfinished Creations: Click Modify to continue or Delete to remove.
Completed Creations:
- Click Apply Modification to request edit permissions. After approval, click Modify to edit.
- Click Apply Deletion to request deletion. After approval, the creation will be deleted.

4. How to Cite Data

4.1 Example Dataset References

"The stereo-seq data used to generate Fig. 1 comes from the STOmicsDB database [1], query number STDS0000058 [2]."
"The spatial mouse kidney data have been deposited into STOmicsDB [1] (https://db.cngb.org/stomics/datasets/STDS0000058 [2])."

4.2 Citation Formats

Citing STOmicsDB:
- Xu, Zhicheng et al. “STOmicsDB: a comprehensive database for spatial transcriptomics data sharing, analysis and visualization." Nucleic Acids Research, vol. 52, D1 (2024): D1053-D1061. doi:10.1093/nar/gkad933
Citing a Visualization Dataset (e.g., STDS0000058):
- Longqi Liu. MOSTA: Mouse Organogenesis Spatiotemporal Transcriptomic Atlas [DS/OL]. STOmicsDB, 2021 [2021-10-22]. https://db.cngb.org/stomics/datasets/STDS0000058/. doi: 10.26036/STDS0000058
Citing Original Data Articles:
- Chen, Ao et al. "Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays." Cell, vol. 185, 10 (2022): 1777-1792.e21. doi:10.1016/j.cell.2022.04.003

5. Visualization Creation - Dataset

5.1 Searching for Datasets

Use the general or advanced search modules to find datasets.

5.2 Metadata Display

Research Information: Includes species, tissues, diseases, developmental stages, publications, and technologies.
Metadata Organization: Includes file types, sizes, and structures. Standard and advanced analyses extract key information.

5.3 Data Standardization

STOmicsDB standardizes data from public databases and user submissions. The process includes:

Data Import: Use Scanpy (v1.8.1) to read data (e.g., expression matrices, spatial locations, tissue images) into AnnData objects.
Standardization: Remove duplicate genes, normalize counts, perform PCA, and cluster using the Leiden algorithm. Identify marker genes and spatially variable genes using Scanpy and spatialDE.
Output: Save results in h5ad format for user access.

Advanced Analysis Visualization

Cell Annotation: Use SCINA (v1.2.0) to annotate clusters with cell types based on marker genes.
Cell Interaction: Use stLearn (v0.4.12) to analyze and visualize significant cell-cell interactions.
Differential Analysis: Identify upregulated genes between cell types and perform enrichment analysis using ClusterProfiler.

5.4 Expression Profile Visualization

Displays gene expression maps, cell annotations, and marker gene distributions for intuitive spatial visualization.

5.5 Data Files and Downloads

STOmicsDB provides controlled and public sharing options:

Raw Data: Unprocessed experimental data or filtered matrices.
Processed Data: Results after standardization, including normalized matrices and cluster analyses.
Custom Data: User-submitted intermediate results or annotations.

5.6 Sample Modules

Sample List: View all samples and their main information.
Sample Details: Access detailed sample information and download data.

6. Frequently Asked Questions

6.1 Template Filling Requirements

To create a visualization, fill in dataset metadata, sample information, sample file information, and analysis file information. Optionally, add a homepage image for personalized dataset display.

Dataset Metadata

FIELDS	DESCRIPTION	EXAMPLE
TITLE	Dataset title	MOSTA: Mouse Organogenesis Spatiotemporal Transcriptomic Atlas
SPECIES	Species name (Tax ID), separate multiple with `	`
TISSUES	Biological tissue names, separate multiple with `	`
ORGAN PARTS	Biological suborgan names, separate multiple with `	`
DEVELOPMENT STAGE	Sample developmental period, separate multiple with `	`
SEX	Sample gender, Male or Female, separate multiple with `	`
TECHNOLOGY	Sequencing technology, separate multiple with `	`
SAMPLE NUMBER	Sample size	16
SECTION NUMBER	Total number of slices	61
DISEASE	Diseases studied, separate multiple with `	`
SUMMARY	Dataset overview (max 4000 characters)	Overview of the dataset and its significance.
OVERALL DESIGN	Experimental design (max 2000 characters)	Description of the experimental setup.
SUBMISSION DATE	Submission date in `yyyy-mm-dd` format	2020-01-24
UPDATE DATE	Update date in `yyyy-mm-dd` format	2020-11-18
CONTRIBUTORS	Contributors, separate multiple with `	`
CONTACT	Contact information, separate multiple with `	`
CITATION	Citations of related articles, separate multiple with `	`
ACCESSIONS	Data source or storage location	CNSA project: CNP0001543
RELATIONS	Data associations with other databases	Stomics submission: STT0000013
PLATFORM	Sequencing platform, separate multiple with `	`

Sample Information

FIELDS	DESCRIPTION	EXAMPLE
SAMPLE_NAME	Sample name	E9.5_EXAMPLE
SECTION_NAME	Slice namespace	Omics technology: When the same sample includes multiple slices, multiple lines need to be filled in Other Omics technology: When no slice is included, the slice name must be the same as the sample name.
TECHNOLOGY	Sequencing technologies, such as Stereo-Seq, 10X Visium, scRNA	Stereo-Seq
SPECIES	Tax ID of the species to which the sample belongs	10090
TISSUE	Sample tissue type	Embryo
ORGAN PARTS	Name of biological subgroup	Embryonic brain
DEVELOPMENT STAGE	Sample development period	E9.5
SAMPLE ID	The sample number submitted to CNSA or spatiotemporal group database; leave it blank if none exists	CNS0001619
SOURCE	For those collected by other databases/institutions, fill in the corresponding information; for those submitted to the gene bank, fill in CNGBdb	CNGBdb
SEX	Sample gender, such as: Male \| Female	Male
PLATFORM	Sequencing platform, such as DIPSEQ-T1	DNBSEQ-T1
DISEASE	Disease of the sample, if none, normal	squamous cell carcinoma

Expression File Information

FIELDS	DESCRIPTION	EXAMPLE
SECTION_NAME	The sample name to which the user analysis file belongs	E9.5_E2S1_EXAMPLE
DATA_TYPE	Sequencing technology	Stereo-Seq
FILE_NAME	File name	cluster_makers.svg
TITLE	Analysis Type	Cluster markers
DESCRIPTION	Analysis Description	The marker genes of each cluster were calculated by scanpy.tl.rank_genes_groups with the "wilcoxon" method. If the original annotation information of dataset is available, we use the original one, if not, we get the annotation information through scanpy.tl.leiden.
MD5	File md5 value	7abcfa8f3abd503e286badf040ba4fa3

Analyze File Information

FIELD NAME	CONTENT	FORMAT DESCRIPTION
SECTION_NAME	The sample name to which the user analysis file belongs	E9.5_E2S1_example
DATA_TYPE	Sequencing technology	Stereo-Seq
FILE_NAME	File name	cluster_makers.svg
TITLE	Analysis Type	Cluster markers
DESCRIPTION	Analysis Description	The marker genes of each cluster were calculated by scanpy.tl.rank_genes_groups with the "wilcoxon" method. If the original annotation information of dataset is available, we use the original one, if not, we get the annotation information through scanpy.tl.leiden.
MD5	File md5 value	7abcfa8f3abd503e286badf040ba4fa3

Optional Information for the Dataset

Display thumbnails: You can add pictures to Excel spreadsheets with unlimited resolution

6.2 Verification Information Explanation

To be continued. If you encounter any related problems, please contact us.

Contact

For assistance, contact P_STOmicsDB@genomics.cn.
We provide detailed guidance and support to ensure a smooth process for data submission and visualization creation.

Support Hours: Monday to Friday, 9:00 AM - 6:00 PM (GMT+8).