Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
Department of Genetics, University of Cambridge, Cambridge, UK.
Wellcome Sanger Institute, Cambridge, UK.
Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA.
The Genome Center, University of California Davis, Davis, CA, USA.
Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA.
Leibniz Institute for Zoo and Wildlife Research, Department of Evolutionary Genetics, Berlin, Germany.
Berlin Center for Genomics in Biodiversity Research, Berlin, Germany.
DNAnexus Inc., Mountain View, CA, USA.
Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea.
Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea.
University of Southern California, Los Angeles, CA, USA.
National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD, USA.
European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK.
Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany.
DRESDEN-concept Genome Center, Dresden, Germany.
Novogene, Durham, NC, USA.
Neurogenetics of Vocal Communication Group, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands.
Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands.
School of Biology, University of St Andrews, St Andrews, UK.
University of Massachusetts Cooperative Fish and Wildlife Research Unit, Amherst, MA, USA.
School of Biological Science, The Environment Institute, University of Adelaide, Adelaide, South Australia, Australia.
Bond Life Sciences Center, University of Missouri, Columbia, MO, USA.
Department of Biology, East Carolina University, Greenville, NC, USA.
UQ Genomics, University of Queensland, Brisbane, Queensland, Australia.
Department of Biological Sciences, Clemson University, Clemson, SC, USA.
The Genetic Rescue Foundation, Wellington, New Zealand.
Kākāpō Recovery, Department of Conservation, Invercargill, New Zealand.
Department of Zoology, University of Otago, Dunedin, New Zealand.
University of Arizona Genetics Core, Tucson, AZ, USA.
Department of Life Sciences, Natural History Museum, London, UK.
School of Natural Sciences, Bangor University, Gwynedd, UK.
Department of Biology, University of Konstanz, Konstanz, Germany.
Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA.
Department of Marine and Environmental Sciences, Northeastern University Marine Science Center, Nahant, MA, USA.
Department of Biology, University of Antwerp, Antwerp, Belgium.
Naturalis Biodiversity Center, Leiden, The Netherlands.
Institute of Biology, Karl-Franzens University of Graz, Graz, Austria.
Florida Museum of Natural History, University of Florida, Gainesville, FL, USA.
Center for Systems Biology, Dresden, Germany.
Zoological Institute, University of Basel, Basel, Switzerland.
Tag.bio, San Francisco, CA, USA.
UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA.
San Diego Zoo Global, Escondido, CA, USA.
Pacific Biosciences, Menlo Park, CA, USA.
Digital BioLogic, Ivanić-Grad, Croatia.
Bionano Genomics, San Diego, CA, USA.
Arima Genomics, San Diego, CA, USA.
Dovetail Genomics, Santa Cruz, CA, USA.
Independent Researcher, Santa Cruz, CA, USA.
CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology, Barcelona, Spain.
Universitat Pompeu Fabra, Barcelona, Spain.
Department of Computer Science, University of Maryland College Park, College Park, MD, USA.
School of Computer Science and Technology, Center for Bioinformatics, Harbin Institute of Technology, Harbin, China.
Department of Psychology, Institute for Mind and Biology, University of Chicago, Chicago, IL, USA.
Department of Genetics and Biochemistry, Clemson University, Clemson, SC, USA.
Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, OR, USA.
Max Planck Institute for the Physics of Complex Systems, Dresden, Germany.
Monash University Malaysia Genomics Facility, School of Science, Selangor Darul Ehsan, Malaysia.
Tropical Medicine and Biology Multidisciplinary Platform, Monash University Malaysia, Selangor Darul Ehsan, Malaysia.
Qatar Falcon Genome Project, Doha, Qatar.
Department of Biosciences, University of Milan, Milan, Italy.
eGnome, Inc., Seoul, Republic of Korea.
LOEWE Centre for Translational Biodiversity Genomics, Frankfurt, Germany.
Senckenberg Research Institute, Frankfurt, Germany.
Goethe-University, Faculty of Biosciences, Frankfurt, Germany.
BGI-Shenzhen, Shenzhen, China.
Department of Biology, Pennsylvania State University, University Park, PA, USA.
Center for Medical Genomics, Pennsylvania State University, University Park, PA, USA.
Center for Computational Biology and Bioinformatics, Pennsylvania State University, University Park, PA, USA.
Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA, USA.
Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, USA.
Hoonygen, Seoul, Korea.
Department of Migration, Max Planck Institute of Animal Behavior, Radolfzell, Germany.
Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia.
Center for Evolutionary Hologenomics, The GLOBE Institute, University of Copenhagen, Copenhagen, Denmark.
University Museum, NTNU, Trondheim, Norway.
China National Genebank, BGI-Shenzhen, Shenzhen, China.
Villum Center for Biodiversity Genomics, Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China.
Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China.
Institute of Molecular and Cell Biology, A*STAR, Biopolis, Singapore, Singapore.
Centre for Biodiversity, Royal Ontario Museum, Toronto, Ontario, Canada.
Smithsonian Conservation Biology Institute, Center for Species Survival, National Zoological Park, Washington, DC, USA.
Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA, USA.
Howard Hughes Medical Institute, Chevy Chase, MD, USA.
The Walter Reed Biosystematics Unit, Museum Support Center MRC-534, Smithsonian Institution, Suitland, MD, USA.
Walter Reed Army Institute of Research, Silver Spring, MD, USA.
Department of Biological Sciences, Earlham Institute, University of East Anglia, Norwich, UK.
Institute of Evolutionary Biology (UPF-CSIC), PRBB, Barcelona, Spain.
Catalan Institution of Research and Advanced Studies (ICREA), Barcelona, Spain.
Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain.
Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Barcelona, Spain.
School of Biology and Environmental Science, University College Dublin, Dublin, Ireland.
Department of Computer Science, The University of Illinois at Urbana-Champaign, Urbana, IL, USA.
School of Life Science, La Trobe University, Melbourne, Victoria, Australia.
Department of Evolution, Behavior, and Ecology, University of California San Diego, La Jolla, CA, USA.
Laboratory of Genomics Diversity-Center for Computer Technologies, ITMO University, St. Petersburg, Russian Federation.
Guy Harvey Oceanographic Center, Halmos College of Natural Sciences and Oceanography, Nova Southeastern University, Fort Lauderdale, FL, USA.
Department of Evolution and Ecology, University of California Davis, Davis, CA, USA.
John Muir Institute for the Environment, University of California Davis, Davis, CA, USA.
Wellcome Sanger Institute, Cambridge, UK. kj2@sanger.ac.uk.
Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany. gene@mpi-cbg.de.
Center for Systems Biology, Dresden, Germany. gene@mpi-cbg.de.
Faculty of Computer Science, Technical University Dresden, Dresden, Germany. gene@mpi-cbg.de.
Department of Genetics, University of Cambridge, Cambridge, UK. rd109@cam.ac.uk.
Wellcome Sanger Institute, Cambridge, UK. rd109@cam.ac.uk.
Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA. adam.phillippy@nih.gov.
Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA. ejarvis@rockefeller.edu.
Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA. ejarvis@rockefeller.edu.
Howard Hughes Medical Institute, Chevy Chase, MD, USA. ejarvis@rockefeller.edu.
High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species1-4. To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.