179
Fast Imputation Using Medium- or Low-Coverage Sequence Data

Tuesday, August 19, 2014: 11:30 AM
Bayshore Grand Ballroom D (The Westin Bayshore)
Paul M VanRaden , Animal Improvement Programs Laboratory, USDA-ARS, Beltsville, MD
Chuanyu Sun , National Association of Animal Breeders, Columbia, MO
Abstract Text:

Direct imputation from raw sequence reads can be more accurate than calling genotypes first and then imputing. An efficient strategy chooses the 2 haplotypes most likely to form the genotype and updates the posterior allele probabilities from the prior probabilities within those haplotypes as each animal’s sequence is processed. Imputation of 1 million loci on 1 chromosome required 20 min and 5 gigabytes of memory using 10 processors for 500 bulls simulated at 8X coverage plus 250 younger bulls that had lower coverage or had different density chips. Percentages of correct genotypes were 99.2, 97.0, and 94.1 for bulls sequenced at 8X, 4X, and 2X coverage and were 98.1, 96.8, and 91.7 for bulls genotyped with 600K, 60K, and 10K density chips. Imputation using sequence with low coverage or high error was less accurate if genotypes from a high-density chip were not included.

Keywords:

imputation

sequence

read depth