Computational Biology

Genome-Wide Scans for Selection

In the era of genomics, we can now probe information buried in the millions of sequence variations that have occurred and persisted in the human genome, in search of signatures of genome evolution. We have developed computational methods, such as the LRH, XP-EHH, and CMS tests, to detect genetic variants under positive selection. These methods identify variants that have recently emerged and spread through populations, relying on the breakdown of recombination as a clock for estimating the ages of alleles. We have applied these methods to large datasets of human genetic variation finding many novel candidates for selection. We are developing methods to further refine the signals from large candidate regions to localize the underlying selected polymorphism. We have developed software to make detection of selection, by these and other methods, possible for the rapidly expanding empirical data on genetic variation in humans and other species.

The lab continues to refine existing, and develop novel, methods and tools to detect and localize signals of selection in humans and other organisms. We are using approaches that take advantage of rapidly expanding datasets of genetic variation and larger population sampling, increasingly affordable full-genome sequencing, and new insights into the structure of genetic variation in the genome. We will apply our methods to look for instances of natural selection, using our own data and data collected for human in 2 international efforts: The International Haplotype Map Consortium (1000 individuals genotyped for 1 million polymorphisms) and the 1000 Genomes Consortium (full genome sequences from 1000 individuals).

Visualization Software

To facilitate the interpretation of large datasets, we work to develop visualization tools for genomic and biomedical data, such as the software package VisuaLyzer. We carry out this work in collaboration with Fathom, Inc.

VisuaLyzer is a computational tool for the visual analysis of epidemiological data. It is intended to provide an intuitive understanding of relationships between variables within large datasets through rapid, dynamic, and intelligent high-dimensional data visualization. The identification of potential predictive variables from within highly multi-dimensional spaces is incredibly unintuitive and has become computationally difficult given today’s wealth of collected data. Fundamentally, VisuaLyzer is a software analysis system that aims to help users distill the most significant existing relationships among disease factors across one or potentially multiple large datasets. Its interactive and easily adaptable visualizations allow the user to abandon the canonical static 2-dimmentional plot and move to exploring data using visualizations that allow for the visual analysis of up to eight or even ten dimensions simultaneously. VisuaLyzer's aims to both direct and facilitate a rapid, multi-dimensional, and intuitive analysis of large datasets.

Data Mining

In both genomics and public health, obtaining meaningful information from large datasets is critical. Our lab works to develop computational methods to explore such multivariable datasets and to detect associations. See Publications for more information.