Title

Documentation

General

With the release of the complete wheat reference genome and the development of next-generation sequencing technology, a mass of genomic data from bread wheat and its progenitors has been yield and has provided genomic resources for wheat genetics research. To access these data easily and effectively, we established WGVD, an integrated web-database including genomic variations from whole-genome resequencing and exome-capture data for bread wheat and its progenitors, as well as selective signatures during the process of wheat domestication and improvement. In this version, WGVD contains 7,346,814 SNPs and 1,044,400 indels focusing on genic regions and upstream or downstream regions. We provide allele frequency distribution patterns of these variations for five ploidy wheat groups or 17 world-wide bread wheat groups, the annotation of the variant types, and the genotypes of all individuals. Selective footprints for Ae. tauschii, wild emmer, domesticated emmer, bread wheat landrace and bread wheat variety are evaluated with two statistical tests (Pi and FST) based on SNPs from 93 whole-genome resequencing data. In addition, we provide the Genome Browser to visualize and explore the relationships of genomic variations and selective footprints, and the alignment tool Blast to search the homologous regions between sequences. All of these features of WGVD will promote wheat functional studies and wheat breeding.

Related articles:

1. Avni, R, Nave, M, Barad, O, et al. (2017) Wild emmer genome architecture and diversity elucidate wheat evolution and domestication. Science, 357, 93–97.
2. He, F, Pasam, R, Shi, F, et al. (2019) Exome sequencing highlights the role of wild-relative introgression in shaping the adaptive landscape of the wheat genome. Nat. Genet, 51, 896–904.
3. Cheng, H, Liu, J, Wen, J, et al. (2019) Frequent intra- and inter-species introgression shapes the landscape of genetic variation in bread wheat. Genome Biol, 20, 1–16.

Methods

I. SNPs
  1. We integrated the published SNP dataset from these studies:
    (1) Avni,R,Nave, M, Barad, O, et al.(2017)Wild emmer genome architecture and diversity elucidate wheat evolution and domestication. Science, 357, 93–97.
    https://www.dropbox.com/sh/3dm05grokhl0nbv/AABe6yrr2FVXdFasJUYEW12ca/Allelic%20diversity?dl=0&preview=all_emmer_filtered_variants_header_to_SAMN04448013.vcf
     The flanking sequences of 100 bp on the emmer reference genome were aligned to the bread wheat reference genome (IWGSC RefSeq v.1.0) using BLAST for each emmer SNP. The following parameters were used to define a blast hit, alignment coverage > 50%, identities > 90%.

    (2) Cheng, H, Liu, J, Wen, J, et al. (2019) Frequent intra- and inter-species introgression shapes the landscape of genetic variation in bread wheat. Genome Biol, 20, 1–16.
     SNPs located in the genic regions and the flanking segment (3 kb upstream or downstream regions) were selected.

    (3) He, F, Pasam, R, Shi, F, et al. (2019) Exome sequencing highlights the role of wild-relative introgression in shaping the adaptive landscape of the wheat genome. Nat. Genet, 51, 896–904.
    http://wheatgenomics.plantpath.ksu.edu/1000EC/files/1kEC_genotype01222019.vcf.gz
     All SNPs published were selected.
  2. Annotation of SNPs was carried out by using snpEff.
II. Indels
  1. We only retained the 1–50 bp indels located in the genic regions and the flanking segment (3 kb upstream or downstream regions) were selected from the study:
    Cheng, H, Liu, J, Wen, J, et al. (2019) Frequent intra- and inter-species introgression shapes the landscape of genetic variation in bread wheat. Genome Biol, 20, 1–16.
  2. Annotation of indels was carried out by using snpEff.
III. Population structure
  1. We only retained the merged SNPs with a maximum missing rate < 0.9 for population structure analysis. ADMIXTURE version 1.3.0 was used to quantify the genome-wide population structure. ADMIXTURE was run for K values from 2 to 7 with 20 bootstrap replicates on AB subgenomes and from 2 to 6 with 20 bootstrap replicates on D subgenomes. Due to the highly repetitive nature of the wheat genome, especially in the intergenic regions, only the SNPs located in the genic regions were used to construct NJ trees for hexaploid bread wheat and tetraploid emmer wheat with PHYLIP version 3.68, respectively. Interactive Tree of Life (iTOL) was used to visualize these trees.
  2. According to ploidy levels and historical categories, all accessions can be separated into: diploid Ae. tauschii (AE), tetraploid wild emmer (WE), tetraploid domesticated emmer (DE) and hexaploid bread wheat. According to the origin of wheat, tetraploid wild emmer can be further separated into: North (WE North), South 1 (WE South1) and South 2 (WE South2). Likewise, hexaploid bread wheat can be further separated into: Africa, Western Europe (EurWest), Eastern Europe (EurEast), Asia, Former Soviet Union (Former SU), North America (NorthAm), South America (SouthAm), Central America (CentAm) and Oceania.
IV. Selection evaluation
  1. WGVD provides Cockerham and Weir Fst (FST), and nucleotide diversity (Pi) for five wheat groups for 93 resequencing samples.
V. Database implementation
  1. High-quality SNPs, indels, selection scores and their corresponding annotations, classification and threshold value, were processed with Perl scripts and stored in the MySQL database.
  2. We use PHP Server Pages, HTML5 and JavaScript to implement search, data visualization and download.

Manual

I. Samples and population structure

Our database integrates resequencing and exome sequencing data from published wheat genetic works, giving a total of 968 sample set representing 5 diploid Ae. tauschii (AE), 53 tetraploid wild emmer wheat (WE), 40 tetraploid domesticated emmer wheat (DE), and 870 hexaploid bread wheat. Hexaploid bread wheat contains 315 landraces and 471 varieties. According to the geographic origin, tetraploid wild emmer wheat was separated into 3 groups: Southern Levant 1 (WE Sorth1), Southern Levant 2 (WE Sorth2), and Northern Levant (WE North), which includes 7, 30 and 16 accessions, respectively. Likewise, hexaploid bread wheat contains 9 groups: 87 African wheat (Africa), 122 Western European wheat (EurWest), 93 Eastern European wheat (EurEast), 201 Asian wheat (Asia), 90 wheat from Former Soviet Union (Former SU), 79 North American wheat (NorthAm), 74 South American wheat (SouthAm), 47 Central American wheat (CentAm) and 70 Oceanian wheat (Oceania).

Fig 1. Geographic distribution and population genetics analyses of 968 wheat samples.

 
II. Variation search

The WGVD allows users to obtain information of SNPs, and indels by searching for a specific gene or a genomic region in any of the two versions of the bread wheat genomes (IWGSC RefSeq v1.0 and IWGSC RefSeq v2.0). Users can filter SNPs and indels further by "Advanced Search", in which some parameters, such as minor allele frequency and consequence type, can be set; this option enables users to narrow down the items of interest in an efficient and intuitive manner. The results are presented in an interactive table and graph. For SNPs and indels, users can obtain related details including variant position, alleles, minor allele frequency, variant effect, the allele frequency distribution pattern and the genotypes of all individuals in 17 world-wide bread wheat groups or five ploidy wheat groups.

SNPs or indels Search



III. Signature search
 

Users can select a specific gene symbol or genomic region, one of the statistical methods (Pi, FST), and a wheat group to view the selection scores. The results are retrieved in a tabular format. When users click the "show" button on the table, selective signals are displayed in Manhattan plots or common graphics, where the target region or gene is highlighted in red colour.

 
IV. WGVD tools

1. Local Chinese Spring (IWGSC RefSeq v1.0 and IWGSC RefSeq v2.0) genome browser

Users can search with a gene symbol, or a genomic region to view SNPs, indels and genomic signature in the global view. Currently, 19 and 6 tracks have been released for the IWGSC RefSeq v1.0 and IWGSC RefSeq v2.0, respectively. The "PDF/PS" item under the "View" menu of navigation bar was used to generate a high quality image in PostScript or PDF formats. The “Reads coverage” track shows the read coverage depth. Most notably, the genotype patterns of the 93 whole-genome resequencing wheat accessions are shown in “Genotype patterns” track with homozygous reference in gray, heterozygous variant in yellow and homozygous variant in green, which would allow users to observe haplotype blocks and different haplotypes.







2. Alignment search tool (BLAST)

We introduced sequence alignment tool ViroBLAST, which can find regions of local similarity between sequences and can be used to infer functional and evolutionary relationships between sequences.

Project organizers

jiangYu

Yu Jiang

Northwest A&F University, Yangling, Shaanxi, China

Email: yu.jiang@nwafu.edu.cn
kangzhengsheng

Zhensheng Kang

Northwest A&F University, Yangling, Shaanxi, China

Email: kangzs@nwsuaf.edu.cn