STOmicsDB Visual Creation Guide
1. About Visualization Creation:
For spatial omics data, we have collected comprehensive data and accurate metadata. The metadata covers research areas, sample tissues, species, spatial resolutions, and publication types, while the data includes gene expression matrices, spatial position information, high-resolution tissue section images, and analysis results such as clustering and cell annotation.
The Dataset Creation system employs rigorous quality control measures to validate the submitted data and metadata, ensuring accuracy and reliability. Furthermore, we performed multiple analyses to curate the collected datasets and displayed the results. Specifically, we annotated cell types, identified spatial regions and genes, and performed cell-cell interaction analysis for these datasets.
Researchers can explore the analysis results and visualize the expression data through the dataset module. STOmicsDB is poised to significantly enhance research insights in the field of spatial transcriptomics through data archiving, sharing, visualization, and analysis.
2. Database address | entry
- Home page: https://db.cngb.org/stomics/
- Visualization creation: https://db.cngb.org/stomics/submission/data_visualization
- Browse the visualization dataset: https://db.cngb.org/stomics/datasets/
3. Visual creation steps
3.1 Visual creation entry:
- Link: https://db.cngb.org/stomics/
- Click the Submit | Create button under the Submission module to enter the profile creation interface.
- Click the Create button to view the creation process.
- Log in via WeChat, CARSI, ORCID, GitHub, BGI, etc.
3.2 New visualization creation
Click the New Creation button to start a new creation. A new creation number stc0000xxx will be generated and the creation details interface will be entered.
3.3 Complete creation details
-
Step 1:
- Download the visualization creation template
- Complete the template according to the template filling requirements (see FAQ- Template filling requirements in this document)
- Click the upload box to upload the template. If an error message appears, you need to modify the corresponding error message according to the error message. If it cannot be solved, please send the error message and template file to your email address.
- Select a data upload method, such as FTP or Aspera, and you can view the help link to complete the data upload. (After the upload is completed, it is expected to wait for 10 minutes for the background to detect the data)
- Click Check Files and Next to verify the existence of the data file and the integrity of the data.
-
Step 2:
- Fill in relevant information such as Submitter, project name, etc.
3.4 Visually create, modify, and delete
- Creation list interface: https://db.cngb.org/stportal/creations/
- Unfinished creation: You can click the Modify button to continue the creation; you can click the Delete button to delete the creation.
- Completed creation: You can click Apply Modification to apply for modification permission. After the administrator approves the application, you can click Modify to modify the creation; you can click Apply Deletion to apply to delete the creation. After the administrator approves the application, the creation will be deleted.
4. Data citation method
4.1 The following are examples of a STOmicsDB dataset reference description:
- The stereo-seq data used to generate Fig. 1 comes from the StomicsDB database [1], and the query number is STDS0000 058 [2].
- The spatial mouse kidney data have been deposited into STOmicsDB [1] (https://db.cngb.org/stomics/datasets/STDS0000058 [2])
4.2 How to Cite
- Cite database of STOmicsDB:
- Xu, Zhicheng et al. “STOmicsDB: a comprehensive database for spatial transcriptomics data sharing, analysis and visualization." Nucleic acids research vol. 52,D1 (2024): D1053-D1061. doi:10.1093/nar/gkad933'
- Cite visualization dataset (Example Dataset: STDS0000058):
- Longqi Liu. MOSTA: Mouse Organogenesis Spatiotemporal Transcriptomic Atlas[DS/OL]. STOmicsDB, 2021[2021-10-22]. https://db.cngb.org/stomics/datasets/STDS0000058/. doi: 10.26036/STDS0000058
- Format: {contributors}. {title}[DS/OL]. STOmicsDB, {the year of submission data}[{submission data}]. {dataset link}. doi: {doi ID}
- Cite original data article (Example Dataset: STDS0000058):
- Chen, Ao et al. "Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays." Cell vol. 185,10 (2022): 1777-1792.e21. doi:10.1016/j.cell.2022.04.003
5. Visualization Creation - Dataset
STOmicsDB integrates data resources at different levels of spatiotemporal omics, conducts data mining on 7,339 articles in public databases, and jointly with the spatiotemporal data submission system to plan 356 spatiotemporal omics data sets and include 15,451 spatiotemporal sample data, providing comprehensive spatiotemporal group data resources for scientific researchers.
In order to fully explore spatiotemporal omics data, the STOmicsDB team built a standardized data analysis process, including standardized processing, dimensionality reduction, clustering, cell type annotation, cell type-specific marker gene analysis and differential gene analysis, spatially variable gene analysis, cell communication analysis, and Hotspot spatial-specific module analysis. Researchers can perform data visualization exploration through the dataset module.
Through the spatiotemporal data submission system and in-depth mining of public data resources, independent public data resources are subjected to data quality control and standardization analysis, and spatiotemporal omics data at different levels are integrated to form a consistent and comparable data set.
5.1 Dataset search filter:
- You can search for relevant datasets through the dataset general search module and advanced search module.
5.2 Meta information display:
- Dataset Research Information Collection:
- Next, we organized and recorded the research information related to the data from different sources, including the studied species, tissues, diseases, developmental period of the samples, published literature, and the spatial omics technology used.
- Dataset metadata collation:
- We then calculate and organize the metadata of the collected data, including file type, size, and data structure.
- After judging the content of the data, a series of standard and advanced analyses were performed to extract the required information.
5.3 Standardization
STOmicsDB will standardize the data collected from public databases and the data submitted by users. The following is the detailed process of data processing.
- Data import and integration:
- About our analysis: First, we use Scanpy (version 1.8.1) to read in data, including expression matrix, Spot-related and Gene-related information, and if there is spatial location information and tissue images, they will also be read in. Scanpy's reading function is used to read data in 10X mtx, h5ad, h5, matrix text and other formats as AnnData objects, save the spatial location to the 'obsm' attribute of AnnData, and convert the tissue image into pixel site information and save it in the 'uns' attribute.
- Data standardization processing and analysis:
- Next, we standardized the AnnData object. We first used the 'var_names_make_unique' method of Scanpy to remove duplicate genes, and then used the 'normalize_total' and 'log1p' methods to normalize and logarithmize the total counts. We used the 'pca' method to perform principal component analysis (PCA) on the top 2000 highly variable genes, the 'neighbors' and 'umap' methods to reduce the data dimension, and finally clustered using the Leiden algorithm. For each data set, we used the Wilcoxon rank sum test using the 'rank_genes_groups' method of Scanpy to find Cluster-specific marker genes. If the data had spatial coordinates, we used spatialDE (version 1.1.3) with default parameters to identify spatially variable genes. The standard processing results were saved as h5ad format files for users to use, and Cluster-specific marker genes or spatially variable genes were displayed as Cluster markers results in the Analysis results module.
Advanced analysis results visualization:
- Cluster cell annotation analysis:
- We used SCINA (1.2.0), an annotation software based on the marker gene library integrated by CellMatch, to perform spatial cell annotation analysis on clusters for human and mouse species that can provide professional and sufficient marker genes. We used expression matrices and marker genes to annotate clusters as cell types and saved the cell annotation result information in h5ad format files.
- Spatial cell interaction analysis:
- We then used stLearn cell interaction analysis software (version 0.4.12) to perform spatial cell interaction analysis on the data after cell annotation and containing spatial location information. First, load the ligand receptor pairs of the corresponding species in the "connectomeDB2020_lit" ligand receptor pair database to determine the significant interactions between spots that occur in these ligand receptor pairs; For each ligand receptor pair, count the instances where neighbors of a significant spot for that LR pair link two given cell types (For each LR pair and each celltype-celltype combination, count the instances where neighbors of a significant spot for that LR pair link two given cell types.) Use a threshold of p<0.05 to identify significant interactions between cell types (Identify significant interactions with p<.05 from cell type information permutation.) Finally, visualize the interactions between cell types. Visualize the Cell-Cell interaction results.
- Analysis of differences between cell types:
- Scanpy's 'rank_genes_groups' method was used again to perform pairwise differential analysis between different cell types, retaining upregulated differentially expressed genes with a threshold of pvalue_adj less than 0.05 and Logfoldchanges greater than 0.15. ClusterProfiler (4.9.2) was used to perform Go and Kegg enrichment analysis on upregulated genes. For data that failed to perform cell type annotation, we performed differential enrichment analysis on marker genes of each cluster.
- Dataset information and visualization:
- Finally, we present the relevant research information, data metadata and data analysis results as a data set on our database website using information display, data visualization and file download, making it convenient for users to view, use and download the data set.
5.4 Expression profile visualization
The expression visualization module displays gene expression maps, cell annotations, marker gene expression maps, etc. Users can intuitively view the spatial expression of genes and the spatial distribution of cells.
5.5 Data Files and Downloads
STOmicsDB provides controlled and public sharing for dataset creators. In the controlled state, users can only view the file name and other related information, but cannot download the file. In the public state, data users can freely download and use the data in accordance with the STOmicsDB policy.
- Analysis Type:
- Differentiate according to data source and degree of analysis:
- Raw data: raw data and filter matrix submitted by users and collected by STOmicsDB. Raw data refers to the data collected directly during the experiment without any processing; filter matrix refers to the data matrix obtained after the raw data has gone through a series of preprocessing steps including format conversion and quality control.
- Processed data: refers to the matrix files and result charts obtained after STOmicsDB standardization processing, such as normalization and logarithmic processing, principal component analysis and cluster analysis.
- Custom data: Users submit intermediate results, chart results, annotation information, etc. generated during the research, making it easier for users to share and display relevant research results of the data.
- Differentiate according to data source and degree of analysis:
5.6 Sample modules:
STOmicsDB provides data display in the sample | slice dimension. By clicking the Sample module, you can view all samples under the data set and their main related information; you can view the detailed information of the sample and download the data of the sample through the STSP0000xxx sample name link.
- Sample List:
- Sample details:
6. Frequently asked questions
6.1 Template filling requirements
To create a visualization, you need to fill in the following: dataset metadata, sample information, sample file information, and analysis file information. You can also choose to add a dataset homepage image to add a personalized display to the dataset during retrieval.
The red parts indicate required fields
Dataset metadata
FIELDS | ILLUSTRATE | EXAMPLE |
---|---|---|
TITLE | Dataset Title | MOSTA: Mouse Organogenesis Spatiotemporal Transcriptomic Atlas |
SPECIES | Species name, please enter Tax ID, separate multiple with | | 10090 |
TISSUES | Biological tissue names, separate multiple with | | Embryo |
ORGAN PARTS | Biological suborganism names, multiple names separated by | | Embryonic brain |
DEVELOPMENT STAGE | The developmental period of the sample, multiple separated by | | E10 |
SEX | Sample gender, Male or Female, multiple separated by | | Male | Female |
TECHNOLOGY | Sample sequencing technology, multiple separated by | | Stereo-Seq | scRNA |
SAMPLE NUMBER | Sample size | 16 |
SECTION NUMBER | Total number of slices. Note: There may be multiple slices under one sample. | 61 |
DISEASE | Diseases studied by the data, multiple diseases separated by | | squamous cell carcinoma |
SUMMARY | Dataset overview, less than 4000 characters | We have only begun to scratch the surface in understanding mammalian development. An overwhelming caveat is the lack of topographic transcriptomic information to correlate signaling cues and cell-cell interactions within the hierarchy of cell fate decisions. Spatially resolved transcriptomic technologies are promising tools to fill this gap. |
OVERALL DESIGN | Experimental design, less than 2000 characters | MOSTA database has a total of 53 sagittal sections from C57BL/6 mouse embryos at E9.5 (~7.1 mm2), E10.5 (~11.5 mm2), E11.5 (~18.8 mm2), E12.5 (~32.1 mm2), E13.5 (~48.4 mm2), E14.5 (~64.1 mm2), E15.5 (~70.8 mm2) and E16.5 (~76.1 mm2) using Stereo-seq. |
SUBMISSION DATE | Text time in yyyy-mm-dd format | 2020-01-24 |
UPDATE DATE | Text time in yyyy-mm-dd format | 2020-11-18 |
CONTRIBUTORS | Required, separate multiple fields with | | Andrew Ji |
CONTACT | Required, separate multiple fields with | | andrewji@stanford.edu |
CITATION | Citations of articles where data is published, separate multiple with | | Chen, Ao et al. "Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays." Cell vol. 185,10 (2022): 1777-1792.e21. doi:10.1016/j.cell.2022.04.003 |
ACCESSIONS | Data source and storage: Data is also archived in or collected from other databases. Multiple data are separated by | | CNSA project: CNP0001543 |
RELATIONS | Data association: expression data is associated with data from other databases. Multiple data are separated by |, for example, with sequencing data from Stomics submission | Stomics submission: STT0000013 |
PLATFORM | Sequencing platform, multiple separated by | | DNBSEQ-T1 |
Sample information
FIELDS | ILLUSTRATE | EXAMPLE |
---|---|---|
SAMPLE_NAME | SAMPLE NAME | E9.5_EXAMPLE |
SECTION_NAME | Slice namespace | Omics technology: When the same sample includes multiple slices, multiple lines need to be filled in Other Omics technology: When no slice is included, the slice name must be the same as the sample name. |
TECHNOLOGY | Sequencing technologies, such as Stereo-Seq, 10X Visium, scRNA | Stereo-Seq |
SPECIES | Tax ID of the species to which the sample belongs | 10090 |
TISSUE | Sample tissue type | Embryo |
ORGAN PARTS | Name of biological subgroup | Embryonic brain |
DEVELOPMENT STAGE | Sample development period | E9.5 |
SAMPLE ID | The sample number submitted to CNSA or spatiotemporal group database; leave it blank if none exists | CNS0001619 |
SOURCE | For those collected by other databases/institutions, fill in the corresponding information; for those submitted to the gene bank, fill in CNGB | CNGB |
SEX | Sample gender, such as: Male | Female | Male |
PLATFORM | Sequencing platform, such as DIPSEQ-T1 | DNBSEQ-T1 |
DISEASE | Disease of the sample, if none, normal | squamous cell carcinoma |
Expression file information
FIELDS | ILLUSTRATE | EXAMPLE |
---|---|---|
SECTION_NAME | THE SAMPLE NAME TO WHICH THE USER ANALYSIS FILE BELONGS | E9.5_E2S1_EXAMPLE |
DATA_TYPE | Sequencing technology | Stereo-Seq |
FILE_NAME | File name | cluster_makers.svg |
TITLE | Analysis Type | Cluster markers |
DESCRIPTION | Analysis Description | The marker genes of each cluster were calculated by scanpy.tl.rank_genes_groups with the "wilcoxon" method. If the original annotation information of dataset is available, we use the original one, if not, we get the annotation information through scanpy.tl.leiden. |
MD5 | File md5 value | 7abcfa8f3abd503e286badf040ba4fa3 |
Analyze file information
FIELD NAME | CONTENT | FORMAT DESCRIPTION |
---|---|---|
SECTION_NAME | The sample name to which the user analysis file belongs | E9.5_E2S1_example |
DATA_TYPE | Sequencing technology | Stereo-Seq |
FILE_NAME | File name | cluster_makers.svg |
TITLE | Analysis Type | Cluster markers |
DESCRIPTION | Analysis Description | The marker genes of each cluster were calculated by scanpy.tl.rank_genes_groups with the "wilcoxon" method. If the original annotation information of dataset is available, we use the original one, if not, we get the annotation information through scanpy.tl.leiden. |
MD5 | File md5 value | 7abcfa8f3abd503e286badf040ba4fa3 |
Optional information for the dataset
- Display thumbnails: You can add pictures to Excel spreadsheets with unlimited resolution
Verification error message
To be continued
Contact Us
If you need any help, please contact P_STOmicsDB@genomics.cn.