GGI-CBG Barcoding NMNH Genera Project

Project Overview

Title: GGI-CBG Barcoding NMNH Genera Project
CBG Units: Collections, Genomics, Informatics
Scope: The GGI-CBG ‘Barcoding NMNH Genera’ Project serves the mutual interests of the Centre for Biodiversity Genomics (CBG) and the Global Genome Initiative (GGI) by making natural history specimens discoverable and accessible to the biodiversity science community.  The project achieves this goal by analyzing thousands of novel and authoritatively-identified specimens at the Smithsonian National Museum of Natural History (NMNH), capturing and sharing their collection data, high-resolution images, DNA barcodes and genomic sample information.
Stage: In Progress (Year 2)
Sequences to date: 5604 specimens; 2897 genera and 3353 species


Project Details

The GGI-CBG Barcoding NMNH Genera Project focuses on the acquisition of DNA barcodes of arthropod genera that are new to the Global Genome Biodiversity Network (GGBN) and the Barcode of Life Datasystem (BOLD) through the harvesting of museum specimens at the Smithsonian National Museum of Natural History (NMNH).

Standard Workflow

Staff from the CBG complete tri-annual visits to the NMNH to select arthropod specimens for the GGI-CBG ‘Barcoding NMNH Genera’ Project. Two representative species of genera that are new to GGBN, Genbank and BOLD (whenever possible) are selected, following GGI-CBG project guidelines and museum curator specifications.  Taxonomy, country of collection, sample ID, and specimen cabinet/drawer locations are carefully recorded by CBG staff. Placeholder labels are used to mark the location of the specimen being removed from the collection, and ensure specimens are returned to their exact locations once processing is complete. A report is generated of species names, sample IDs, and country of collection for all specimens which is approved by the GGI Project manager and museum curators before being transported to CBG.

Once transported to CBG, specimens are accessioned and labelled with BOLD and USNM ENT labels prior to digitization. Digitization of specimen data, imaging, and tissue sampling of legs for all arrays are completed following pre-determined specifications by museum curators and uploaded to BOLD. The whole voucher protocol involves removing specimens from points and placing them directly into sampling plates (arrays) by CBG staff, performing voucher recovery, and re-pointing of specimens by NMNH staff once returned to the museum. All necessary precautions are taken to prevent cross-contamination of and/or damage to the specimens during imaging, subsampling, and sequencing.

DNA samples are extracted using the silica-based protocol outlined in Ivanova, deWaard & Hebert (2006; DOI: 10.1111/j.1471- 8286.2006.01428.x). DNA samples are PCR amplified and sequenced following protocols detailed in Hebert et al. (2013; DOI: 10.1371/journal.pone.0068535) and Prosser et al. (2016; DOI: 10.1111/1755-0998.12474) that target overlapping fragments of the cytochrome c oxidase I (COI) gene. After sequence editing is complete, sequences are submitted to BOLD. BOLD projects are created to store all specimen data, images, sequence data and associated files from each museum harvesting phase. Representatives from the NMNH are added to each BOLD project as administrators.

Figure 1: Museum Workflow for the GGI-CBG Barcoding NMNH Genera Project

All specimen data, images, GenBank accession numbers, and DNA bank data (following the GGBN Data Standard; Droege et al. 2016; DOI: 10.1093/database/baw125) are provided to the GGI project manager upon completion, and formatted for input into the NMNH EMu collection management system. Species authorship of the specimen records is completed by CBG and GGI staff, and sent to the GGI project manager.

DNA extracts are split (20 ul each) between the DNA archives of CBG and NMNH. All specimens are returned to their original locations within the collection during the subsequent visit. All successfully sequenced records are then submitted to GenBank and made public, conforming to the BARCODE Data Standard. USNM voucher information is listed in the “specimen voucher” field of all GenBank records, ensuring the correct linkage with records in the NMNH EMu collections database.

Project Outcomes (September 2019)

In Year 1, three trips to the NMNH were completed in June 2018 (Phase 1), September 2018 (Phase 2) and December 2018 (Phase 3). In total, 4274 insect specimens were borrowed from the NMNH, which represents 12 orders, 127 families, 2198 genera and 2556 identified species collected from 118 countries (Figure 2). As of July 2019, 2070 of the 2198 selected genera were new to GGBN, 1948 were new to GenBank and 1188 were new to BOLD. This constitutes 939 BINs with 58% (541 BINs) being unique to this project. Overall sequencing success was 60%.

Figure 2: Collection locations for specimens borrowed from the NMNH in Year 1

Figure 3: GGI-CBG Recovery Rates for Sanger and NGS Sequencing

In Year 2, two of three trips to the NMNH have been completed in June 2019 (Phase 1), September 2019 (Phase 2). A third visit is planned for December 2019 (Phase 3). In Phases 1 and 2, 3220 insect specimens were borrowed from the NMNH, which represents 3 orders, 72 families, 1791 genera and 1678 unique species collected from over 100 countries. Of the 1791 genera, 1773 are new to GGBN, 1747 are new to Genbank and 1134 are new to BOLD; of the 1678 identified species, 1637 are new to GGBN, 1613 are new to GenBank and 852 are new to BOLD.

Project Reports



Jeremy deWaard –

Meredith Miller –

Additional Resources:

Global Genome Initiative (GGI)

Global Genome Biodiversity Network (GGBN)

Smithsonian National Museum of Natural History (NMNH)