Q: How much and what kind of sequencing is being offered?
A: The goal is to sequence 10,000 biodiverse species over the next 5 years. This means high quality de novo genomes and, to the extent that live tissues are available, transcriptomes as well.
Q: What kind of species will be sequenced?
A: About 6000 of these species will be embryophytes (land plants) and about 4000 will be eukaryotic microbes (both photosynthetic and heterotrophic protists).
Q: How will the data be released and what are the constraints?
A: Once every 3 months we will evaluate the quality of the assemblies, determine which are usable based on several tiers of usability (e.g. scaffolds bigger than average gene, scaffolds bigger than average syntenic block). The sequences and annotations will be made publicly accessible through the CNGB website and there will be a GigaScience paper for users to cite if they should publish using this data. As a reward for collaborating with us, sample providers will be given one month exclusive access.
Q: Will we tackle the difficult genomes too?
A: All samples will be subjected to a low pass shotgun (fixed number of Gb) to estimate genome size, complexity, etc. Those deemed too difficult to sequence will be deferred and/or replaced with another more tractable sample from the same taxa. In later years, remaining gaps in desired coverage will be reassessed, in light of whatever technology exists at that time.
Q: If we abandon a sample will the unassembled reads be released?
A: Yes of course. Some people find creative uses even for unassembled low quality reads.
Q: Who will contribute samples for sequencing?
A: This is meant to be a community project. We will balance the convenience of getting a lot of our samples from the major botanical gardens with getting a small number of samples from each of many individuals worldwide. For this reason, we will automate the sample proposal/selection/submission process so that we are not inundated with over 10,000 E-mails .
Q: Do suppliers contribute plant materials or DNA/RNA extractions?
A: Honestly, and all else being equal, we would prefer suppliers who can provide high molecular weight DNA suitable for making large insert libraries necessary for long range contiguity. Similarly, live tissues suitable for RNA extractions will be preferred. But we also recognize that not every contributor will have the facilities to perform these extractions, and we plan to set up centralized extraction laboratories in Europe and North America and Asia/China to perform the extractions for those who cannot.
Q: How important is the sample documentation?
A: LIFE OR DEATH. We intend to be ruthless about this. The database will only accept officially approved species names. All samples must be vouchered, and we want a picture, not just a voucher number. The provider must accept legal responsibility for compliance with international protocols regarding shipment of plant materials (e.g. Nagoya protocol). No exceptions whatsoever.
Q: What is the selection process, beyond the phylogenetic diversity rule?
A: To a large degree, this must be automated because none of us wants to read 10,000 proposals. We will indicate how many species we want from which taxa, and devise a scoring system that considers the technical feasibility (e.g. genome size, ploidy status) and scientific interest (e.g. convergent evolution, medicinal plants, wild relatives of domesticated crops, plants that thrive under extreme stress).
Q: So when can people start making suggestions?
A: The sequencing team is ready now, but we must first automate the proposal/selection/submission process. Thank you for your patience and please check back in October.