210
Correcting For Unequal Sampling in Principal Component Analysis of Genetic Data
Correcting For Unequal Sampling in Principal Component Analysis of Genetic Data
Tuesday, August 19, 2014: 11:30 AM
Bayshore Grand Ballroom A (The Westin Bayshore)
Abstract Text: Principal component analysis (PCA) is one of the most widely used tools to explore variability of high dimensional data. PCA is used for population and quantitative genetics. Its popularity has recently increased due to the huge amount of molecular markers available in datasets worldwide. In genetics, a common issue due to external constraints is uneven sampling of populations, limiting the usefulness of PCA because of well-known sample size sensitivity and two-dimensional projection bias. Here we evaluated the use of weighted PCA (wPCA) in genetic data in order to correct uneven sampling bias. Simulations suggest that wPCA improves the two-dimensional projections of PCA data and, in some cases, recovers population relationships patterns, even when sample size is as low as n=1. We used this correction in pig data from populations with uneven sampling, recovering a more realistic structure than inferred with only PCA.
Keywords:
SNP
Population structure
Phylogeography