Analytics


The Analytics Unit oversees data validation, data quality assurance, research analytics and specimen digitization of all CBG projects. The team provides assistance to CBG researchers, students, external collaborators with respect to digitization (imaging, weighing, etc.), data availability, statistical analysis, GIS mapping, ecological data, etc. The central goal of its work is to help generate high-quality publication-ready data, and to assist with their interpretation.

The team works closely with all other units at the institute to establish workflows and validation schemes that ensure the highest possible quality of data for the various projects at the CBG. For instance DNA barcode reference sequences generated with our Pacific Biosystems Sequel I and II platforms are validated by the Analytics team and associated records on BOLD are ameliorated with further information (e.g. images, dry weight). Members of the unit are monitoring sequence quality and provide feedback to both the Genomics and the Informatics unit to ensure an optimal QA/QC process.

Several approaches have been introduced to characterize the relations between particular traits, evolutionary lineages, and variation in environmental conditions across sites. Some newer methods allow for the analysis of the association of trait states and pylogenetic trees with spatially variable environmental factors. The quality of such analyses will depend on the completeness and the accuracy of the sequences included in a reference library such as BOLD but also in the availability of high quality trait data. The Analytics unit is developing automated data mining tools (crawler and scraper technology) to assemble comprehensive trait datasets.

The combination of high-throughput DNA sequencing at the CBG with statistical modelling approaches makes it possible to scale up from data-rich but finite sets of point samples to spatially continuous biodiversity maps. The Analytics unit supports CBG researchers in building continuous species maps and estimation of metrics such as richness and dissimilar¬ity, as well as species abundance or biomass, depending on the sampling and analytical methods used.