Ensembl Plant Genome Database
Ensembl Plant Genome Database

The Ensemble Plants is an integrative resource presenting genome-scale information for 96 sequenced plant species. Available data includes dna sequence, protein sequence , functional annotation.

数据量: 96
更新时间: 2021-02-24

1.Background 背景描述

Ensembl Plants is an integrating web portal for genome-scale data from plant species. The portal offers access to data including reference sequence, gene annotations, RNA and protein alignments, comparative analyses and variation data. In addtion to the web portal, including brassica, tomato, grape, barley, potato, maize and wheat, and taxonomically diverse model organisms.

Ensembl Plants 是一个整合的植物基因组数据的门户网站,该门户提供参考序列、基因注释、RNA和蛋白质比对、比较分析和变异数据,涵盖了一系列重要的经济作物,包括甘蓝、番茄、葡萄、大麦、马铃薯、玉米和小麦,以及多样性模型生物。

2.Data description 数据说明

2.1 data processing 数据来源

This Databse collect the latest version from official download website release-49, include Includes protein sequences, annotations, and transcripts 96 plant species.


Reference 参考文献

Dan Bolser et al.(2016) ** Ensembl Plants: Integrating Tools for Visualizing, Mining, and Analyzing Plant Genomics Data ** . Plant Bioinformatics. DOI: 10.1007/978-1-4939-3167-5_6

2.2 Meta data 元信息表

Field 字段说明
species 物种名称
dna DNA序列文件
protein_sequence 蛋白序列文件
gene_sets 基因注释文件

3. Workflows 工作流程说明

Gene homology discovery using Hidden Markov Models

HMMER is widely used to search homologous protein or nucleotide sequences agianst relevant database using multiple sequence alignment profiles as queries through profile HMM methods. Its major utilizations include searching either a single protein sequence, multiple protein sequence alignment or profile HMM against a target sequence database.

Here, HMMER was implemented to discover all members of a given gene family in the gene coding product datasets generated from the 1000 Plant transcriptomes initiative. Later, we plan to provide more comprehensive datasets for characterizing the diversity of all functional gene families.


HMMER广泛用于在相关数据库中搜索同源蛋白质或核苷酸序列,它基于多个序列比对生成的比对矩阵文件,采用隐马尔可夫模型的算法进行同源基因的鉴定。它的主要用途包括搜索单个蛋白质序列、多个蛋白质序列比对或针对目标序列数据库的使用隐马尔可夫模型进行搜索。 在这里,HMMER的部署是为了搜索由千种植物转录组项目生成的基因编码产品数据集中给定基因家族的所有成员。稍后,我们计划提供更全面的数据集来研究所有功能基因家族的多样性特征。

reference 参考文献

Zhang, Z., Wood, WI. (2003). A profile hidden Markov model for signal peptides generated by HMMER. Bioinformatics. Doi: 10.1093/bioinformatics/19.2.307