The Phytozome Database collect JGI-sequenced plant genomes, as well as selected genomes and datasets that have been sequenced .
1.Background 背景描述
Phytozome is the Plant Comparative Genomics portal of the Department of Energy’s Joint Genome Institute. As of release v13, Phytozome provides access to more than 200 the sequences and functional annotations of a growing number of complete plant genomes, including all the land plants and selected algae sequenced at the Joint Genome Institute,these data are available to the broader plant science research community。
Phytozome是美国能源部联合基因组研究所的比较植物基因组学门户网站,截止到v13版本,Phytozome提供了超过200个完整植物基因组的序列和功能注释,包括在联合基因组研究所测序的所有陆生植物和藻类,这些数据可以提供给广泛的植物科学研究。
2.Data description 数据说明
2.1 data processing 数据来源
This Databse collect the latest version Phytozome v13 from official download website Phytozome Downloads, include Includes protein sequences, annotations, and transcripts of 209 plant species.
数据库从官方下载网站收集最新版本的Phytozome v13 Phytozome 下载,其中包括209个植物物种的蛋白质序列、注释和转录本。
Reference 参考文献
David M.Goodstein et al.(2012) ** Phytozome: a comparative platform for green plant genomics **. Nucleic Acids Research. DOI:10.1093/nar/gkr944
2.2 Meta data 元信息表
Field | 字段说明 |
---|---|
organism | 生物名称 |
species | 物种名称 |
common_name | 普通名称 |
protein | 蛋白序列文件 |
gff3 | 注释文件 |
transcript | 转录本文件 |
3. Workflows 工作流程说明
Gene homology discovery using Hidden Markov Models
HMMER is widely used to search homologous protein or nucleotide sequences agianst relevant database using multiple sequence alignment profiles as queries through profile HMM methods. Its major utilizations include searching either a single protein sequence, multiple protein sequence alignment or profile HMM against a target sequence database.
Here, HMMER was implemented to discover all members of a given gene family in the gene coding product datasets generated from the 1000 Plant transcriptomes initiative. Later, we plan to provide more comprehensive datasets for characterizing the diversity of all functional gene families.
基于隐马尔可夫模型的鉴定同源基因
HMMER广泛用于在相关数据库中搜索同源蛋白质或核苷酸序列,它基于多个序列比对生成的比对矩阵文件,采用隐马尔可夫模型的算法进行同源基因的鉴定。它的主要用途包括搜索单个蛋白质序列、多个蛋白质序列比对或针对目标序列数据库的使用隐马尔可夫模型进行搜索。 在这里,HMMER的部署是为了搜索由千种植物转录组项目生成的基因编码产品数据集中给定基因家族的所有成员。稍后,我们计划提供更全面的数据集来研究所有功能基因家族的多样性特征。
reference 参考文献
Zhang, Z., Wood, WI. (2003). A profile hidden Markov model for signal peptides generated by HMMER. Bioinformatics. Doi: 10.1093/bioinformatics/19.2.307