Genome-wide association studies (GWAS) attempt to identify genetic variations associated with specific traits or diseases across the human genome (Manolio, 2010). An effective GWAS typically requires dense marker coverage across the genome to capture patterns of genetic variation linked to the phenotype of interest. Genotype imputation is a statistical technique that is used to infer genotypes at untyped markers. Impuitation is commonly used to increase marker density in GWAS datasets (Marchini & Howie, 2010). Genotype imputation relies heavily on high-quality haplotype reference panels, which catalog patterns of genetic variation in reference populations (Spencer, Su, Donnelly, & Marchini, 2009). The 1000 Genomes Project (1KGP) generated a comprehensive catalog of human genetic variation using sequencing technologies, which provided a powerful resource for creating improved haplotype reference panels (Delaneau, Marchini, & Consortium, 2014).

A haplotype represents a specific set of genetic alleles at multiple loci, or variatns, which are close together on the same chromosome and tend to be inherited as a single unit (Ziegler, 2011). Variants often include single nucleotide polymorphisms (SNPs). Linkage disequilibrium (LD) is the phenomenon underlying haplotypes which describes the non-random association of alleles at different loci on the same chromosome (Ziegler, 2011). Due to limited recombination between closely spaced variants over generations, specific combinations of haplotypes persist in populations. Regions of the genome characterized by strong LD and limited haplotype diversity are often referred to as haplotype blocks.

Haplotypes contribute significantly to human genetic diversity beyond the variation captured by individual SNPs alone  (Manolio, 2010). Different combinations of alleles across linked loci create distinct haplotype patterns within and between populations and the specific structure and frequency of haplotypes can vary across different global populations due to distinct demographic histories and evolutionary changes (Delaneau, Marchini, & Consortium, 2014). Specific haplotypes may carry functional variants, either coding changes within genes or regulatory variants affecting gene expression, that make changes to traits or disease susceptibility (Manolio, 2010).

A haplotype reference panel serves as a detailed catalog of common haplotypes observed within one or more reference populations  (Marchini & Howie, 2010). These panels are constructed using dense genotype or sequence data from individuals representing the populations of interest (Spencer, Su, Donnelly, & Marchini, 2009). In GWAS, researchers typically genotype study participants using SNP arrays, which capture only a subset of common genomic variation (Spencer, Su, Donnelly, & Marchini, 2009).

Genotype imputation uses the study participants’ typed SNP data along with the haplotype reference panel to statistically infer genotypes at SNPs not directly measured on the array (Marchini & Howie, 2010). Imputation effectively increases the density of genetic markers analyzed in the GWAS, often from hundreds of thousands to millions of variants (Manolio, 2010). The increased marker density improves the power of GWAS to detect association signals and facilitates fine-mapping efforts to identify potential causal variants within associated regions (Marchini & Howie, 2010).

The 1KGP sought to create a deep catalog of human genetic variation, including SNPs, indels, and structural variants, by sequencing individuals from diverse global populations (Delaneau, Marchini, & Consortium, 2014). They used sophisticated statistical methods in order to integrate low-coverage whole-genome sequencing, high-coverage exome sequencing, and dense SNP array data from 1KGP participants. This integration allowed for the construction of a highly accurate and detailed haplotype reference panel (Delaneau, Marchini, & Consortium, 2014). Compared to previous reference panels primarily based on SNP array data, the 1KGP panel offered several improvements.

These improvements included much better representation and phasing accuracy for lower-frequency and rare variants. The panel also provided enhanced coverage across a wider range of diverse global populations included in the 1KGP. The use of sequence data directly improved the accuracy of estimated haplotypes compared to methods relying solely on array data (Delaneau, Marchini, & Consortium, 2014).

Applying the improved 1KGP haplotype reference panel for imputation significantly enhances GWAS utility. The increased accuracy of imputation, especially for less common variants, allows researchers to test a larger proportion of genomic variation for association with diseases and traits (Marchini & Howie, 2010). The improved power increases the likelihood of discovering novel genetic associations, especially those driven by lower-frequency variants that might have been poorly imputed using older panels  (Manolio, 2010).

Better representation of diverse populations in the 1KGP panel improves imputation quality and facilitates discoveries in non-European ancestry groups, contributing to more equitable genetic research  (Delaneau, Marchini, & Consortium, 2014). The higher density of accurately imputed variants resulting from the 1KGP panel also aids in fine-mapping association signals, which helps to narrow down the set of causal variants within a genomic region linked to a disease (Manolio, 2010).

Accurate and comprehensive haplotype reference panels are important tools for modern GWAS, enabling genotype imputation to maximize marker coverage. The 1KGP produced an improved reference panel with better representation of rare variants and diverse populations (Delaneau, Marchini, & Consortium, 2014). Utilizing the 1KGP panel for imputation boosts the power and resolution of GWAS, enhancing our ability to discover novel genetic associations and understand the genetic underpinnings of complex human diseases and traits.

References

Delaneau, O., Marchini, J., & Consortium, T. 1. (2014). Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel. Nature Communications; 5(3934), https://www.nature.com/articles/ncomms4934.

Manolio, T. (2010). Genomewide association studies and assessment of the risk of disease. New England Journal of Medicine, 363(2), 166–176. https://doi.org/10.1056/NEJMra0905980.

Marchini, J., & Howie, B. (2010). Genotype imputation for genome-wide association studies. Nature Reviews Genetics, 11(7), 499–511. https://doi.org/10.1038/nrg2796.

Spencer, C., Su, Z., Donnelly, P., & Marchini, J. (2009). Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip. PLoS Genetics, 5(5), e1000477, https://doi.org/10.1371/journal.pgen.1000477.

Ziegler, A. (2011). A Statistical Approach to Genetic Epidemiology (2nd ed.). Wiley Professional Development (P&T), https://bookshelf.vitalsource.com/books/9783527633661.

Posted in

Leave a comment