Supporting Canadian and international researchers in the generation and interpretation of high-quality publication-ready data

The Analytics unit is responsible for specimen digitization, image analysis, and sequence data validation. The unit also supports researchers in building continuous species maps and in the application of statistical modelling to complex datasets.

The Unit Manages

Data Analysis & Validation

Integrated Workflows & Validation Systems

The team works closely with all other units at the institute to establish workflows and validation schemes that ensure the highest possible quality of data for the various projects at the CBG. For instance DNA barcode reference sequences generated with our Pacific Biosystems Sequel I and II platforms are validated by the Analytics team and associated records on BOLD are ameliorated with further information (e.g. images, dry weight). Members of the unit are monitoring sequence quality and provide feedback to both the Genomics and the Informatics unit to ensure an optimal QA/QC process.

Unique & Integrated Research Methods

Automated Data Mining & Machine Learning

Several approaches have been introduced to characterize the relations between particular traits, evolutionary lineages, and variation in environmental conditions across sites. Some newer methods allow for the analysis of the association of trait states and phylogenetic trees with spatially variable environmental factors. The quality of such analyses will depend on the completeness and the accuracy of the sequences included in a reference library such as BOLD but also on the availability of high quality trait data. The Analytics unit is developing automated data mining tools (crawler and scraper technology) to assemble comprehensive trait datasets.

Specimen Digitization

High-Throughput Imaging Protocols

Aside from digital SLR and conventional microscope-camera systems, our imaging team is using three Keyence VHX-7000 Digital Microscopes with fully integrated heads and automatic stages to permit high resolution (4K) microphotography of individual specimens. Because its scanning stage can hold a 96 well plate, the system can automatically acquire a high resolution image of each specimen by controlling movements in X-Y coordinates. With the current setup the team can gather about 2 million images per year.
These high resolution, in-focus, digital images of every specimen are ideally suited for the development of machine learning approaches that employ pattern recognition algorithms to automate a high-level taxonomic assignment (order, family) for each specimen.

Research Analytics & Visualization

Applying Statistical Models to Massive Datasets

The combination of high-throughput DNA sequencing at the CBG with statistical modelling approaches makes it possible to scale up from data-rich but finite sets of point samples to spatially continuous biodiversity maps. The Analytics unit supports CBG researchers in building continuous species maps and estimation of metrics such as richness and dissimilarity, as well as species abundance or biomass, depending on the sampling and analytical methods used.

Expanding knowledge of life on our planet