210
Correcting For Unequal Sampling in Principal Component Analysis of Genetic Data

Tuesday, August 19, 2014: 11:30 AM
Bayshore Grand Ballroom A (The Westin Bayshore)
William O Burgos-Paz , Universitat Autònoma de Barcelona, Bellaterra, Spain
Sebastián E Ramos-Onsins , Centre for Research in Agricultural Genomics, Bellaterra, Spain
Miguel Perez-Enciso , Universitat Autònoma de Barcelona, Bellaterra, Spain
Luca Ferretti , UMR 7138, UPMC and CIRB, College de France, Paris, France
Abstract Text: Principal component analysis (PCA) is one of the most widely used tools to explore variability of high dimensional data. PCA is used for population and quantitative genetics. Its popularity has recently increased due to the huge amount of molecular markers available in datasets worldwide. In genetics, a common issue due to external constraints is uneven sampling of populations, limiting the usefulness of PCA because of well-known sample size sensitivity and two-dimensional projection bias. Here we evaluated the use of weighted PCA (wPCA) in genetic data in order to correct uneven sampling bias. Simulations suggest that wPCA improves the two-dimensional projections of PCA data and, in some cases, recovers population relationships patterns, even when sample size is as low as n=1. We used this correction in pig data from populations with uneven sampling, recovering a more realistic structure than inferred with only PCA.

Keywords:

SNP

Population structure

Phylogeography