Selection of sequence variants to improve dairy cattle genomic predictions

Tooker, Melvin

Abstract Text:

Genomic prediction reliabilities improved when adding selected sequence variants from run 5 (July 2015) of the 1,000 bull genomes project. High density (HD) imputed genotypes for 26,970 progeny tested Holstein bulls were combined with candidate sequence variants within or near genes for 444 Holstein animals. Variants with minor allele frequency (MAF) <0.01, incorrect map locations, excess heterozygotes, or low correlations of sequence and HD genotypes for the same variant were removed. Individual genotype probabilities <0.98 from Beagle and Mendelian conflicts between parents and progeny were set to missing. Test 1 included 481,904 candidate sequence SNP consisting of 107,471 exonic, 9,422 splice, 35,242 untranslated regions at the beginning and end of genes, 329,769 SNP upstream or downstream of genes. Test 2 also included 249,966 insertions and deletions (indels). After merging sequence variants with 312,614 HD SNP and editing, Test 1 included 762,588 variants and Test 2 included 1,003,453. Imputation quality was assessed by keeping 404 of the sequenced animals in the reference population and randomly choosing 40 animals as a test set. Their sequence genotypes were reduced to the subset in common with HD genotypes and then imputed back to sequence. Percentage of correctly imputed variants averaged 97.3% across all chromosomes in Test 1 and 97.2% in Test 2. Total time required to prepare, edit, and impute the sequence variants for 27,235 animals was about 5 d using <20 processors. Computation of genomic predictions using deregressed evaluations from August 2011 for 33 traits and 19,575 bulls required about 3 d with 33 processors. Predictions were tested using 2015 data of 3,983 U.S. bulls whose daughters were first phenotyped after August 2011. Many sequence variants had larger estimated effects than nearby HD markers, but prediction reliability improved only 0.6 percentage points in Test 1 when sequence SNP were added to HD SNP, and only 0.4 higher than HD SNP in Test 2 when sequence SNP and indels were included. However, selecting the 17,000 candidate SNP with largest estimated effects and adding those to the 60,671 SNP used in routine evaluations improved reliabilities by 2.7 percentage points (67.4% vs. 64.7%) on average across traits, compared to 35.2% parent average reliability. Accuracy of prediction can improve by adding selected sequence SNP to marker sets.

Keywords: causative variant, sequence data, genomic evaluation

298
Selection of sequence variants to improve dairy cattle genomic predictions

Meeting Information

298 Selection of sequence variants to improve dairy cattle genomic predictions

Meeting Information

298
Selection of sequence variants to improve dairy cattle genomic predictions