PVD: Pathogen Variation Database

Pathogen Variation Database（PVD）

The human pathogen database contains information of pathogenic microorganisms that cause human infectious diseases, including pathogen classification, biological characteristics, nucleic acid sequence，disease phenotype and patient immunological characteristics. It includes three sub-databases: chronic infectious disease pathogens database, emerging infectious disease pathogens database, and major infectious disease pathogens database，providing the search and identification tools for clinicians and researchers.

Rapid Microbe Explorer

RME is a computational pipeline for exploring microbe from next-generation sequencing (NGS) data. This pipeline composes several phases including human references mapping, classification on nucleotide level and exploration on protein level. RME uses the quickly mapping tool SNAP for human reference mapping. And to eliminate the influence of human sequence as far as possible, RME take hg19, YH and refMrna as reference for subtractive alignments. After removing human sequences, RME uses Kraken to classify reads on nucleotide level, which can achieve both high sensitivity and high speed. Then the left unclassified reads can be quickly explored by Kaiju, which can classify reads on protein level. As a rapid microbe exploring tool for NGS data, RME has numerous applications, such as pathogen identification of clinical samples, outbreak investigation, and so on. And RME can take fastq or fasta format data from a range of sequencing platforms (e.g. BGISEQ-500, Illumina, Ion Torrent).

1 Input file Requirements

RME accepts FASTA and FASTQ files as input, and the file can be compressed with gzip. The data can be paired-end or single-end reads generate from sequencing platforms such as BGISEQ-500, Illumina, Ion Torrent and so on.

2 Parameter setting

Except the Kraken module, all the other module (including quality control module, SNAP module and Kaiju module) can be skipped. If your data has been filtered the low quality reads, you can just skip the quality control module. If your data doesn't include human sequences, you can skip the SNAP module. And if you want just get the basic results from nucleotide level, you can choose to not execute the Kaiju module.

3 Output Interpretation

Every module will generate a statistics table of reads number or pairs. The Kraken module and Kaiju module can generate the reports of microbe classification. RME can also provide the figure results generate by Krona.

The following figure illustrates the RME working pipeline.

REFERENCE

Wood, D. E., & Salzberg, S. L. (2014). Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biology, 15(3), R46.
Menzel, P., Ng, K. L., & Krogh, A. (2016). Fast and sensitive taxonomic classification for metagenomics with kaiju. Nature Communications, 7, 11257.
Zaharia, M., Bolosky, W. J., Curtis, K., Fox, A., Patterson, D., & Shenker, S., et al. (2011). Faster and more accurate sequence alignment with snap. Corr, 2011.
Phillippy, A. M., Bergman, N. H., & Ondov, B. D. (2011). Interactive metagenomic visualization in a web browser. BMC Bioinformatics, 12(1), 385.