Some abstracts do not have video files because ASAS was denied recording rights.
409
Genomic prediction using imputed sequence data in dairy and dual purpose breeds
Technical progress has made it possible to re-sequence individuals within a reasonable time frame and at acceptable costs. However, as sequencing all individuals of a breeding population is still too expensive, only key individuals of a population contributing most to the genetic variation usually are chosen to be sequenced. All other individuals genotyped with common single nucleotide polymorphism (SNP) arrays are then imputed up to all known SNPs and possibly biallelic short insertions or deletions (indels) at sequence level. Different simulation studies have shown that using sequence data for genomic prediction can have a positive effect on the accuracy and the stability of marker effect estimates especially when using variable selection methods. We thus tested these hypotheses with two different data sets; one with over 6000 Fleckvieh bulls genotyped with 50k or 777k and one with over 2000 Brown Swiss dairy cattle genotyped with 30k, 50k or 777k, both imputed to sequence level with a reference set of 150 and 123 sequenced individuals, respectively. With the Fleckvieh data set, no or only very slightly higher prediction accuracies were found with imputed sequence data than with SNP array data for six different traits studied. This was true for different genomic BLUP models as well as for GBCPP, a fast EM based variable selection method similar to Bayes Cπ. Attempts to reduce noise by modelling only specific subsets of SNPs (e.g. very accurately imputed SNPs, SNPs from genic regions) generally improved prediction compared to modelling all imputed SNPs. Sequence-based predictions did not appear to be more stable as prediction ability decreased similarly for both 50k and sequence data when sires and/or grandsires of candidates were removed from the calibration set. For Brown Swiss, a slight increase in prediction accuracy was found for non-return rate after 56 days in heifers when modelling all imputed SNPs with GBCPP compared to modelling only SNPs from the 50k array. Using prior biological information by modelling only the 50k most significant SNPs obtained from a genome-wide association study did not improve prediction accuracy, but outperformed prediction based on the 50k array. Possible explanations for the limited success of genomic prediction with sequence data are inaccuracies in imputed genotypes, especially for variants with small minor allele frequencies, lack of proper models to account for the underlying genetic architecture, and incompleteness of genome maps and structural annotation.
Keywords:
Genomic prediction, sequence data