27
Performance of Genomic Prediction using Haplotypes in New Zealand Dairy Cattle
Genomic prediction has traditionally been performed using models that fit covariates for SNP genotypes; however moving to fit covariates for haplotypes may improve accuracy, bias, or run-time. Approximately 58,000 Holstein Friesian, Jersey and Kiwi Cross dairy cattle from New Zealand were genotyped on Illumina BovineSNP50 or HD panels. Genotypes at 37,740 SNPs were phased using LinkPHASE and DAGPHASE. Haplotype blocks were assigned based on length: 125kb, 250kb, 500kb, 1Mb or 2Mb, corresponding to, on average, 2, 4, 8, 15 or 30 SNPs per block. Genotyped females with milk fat records were separated into training (n = 23,907) and validation (n = 14,478) based on birthdates before or after 1 June 2008. BayesA was run in GenSel fitting either SNP genotype or haplotype allele dosage; low frequency haplotype alleles were ignored, by filtering on their frequency in the training population: 1%, 2.5%, 5%, 10%. The SNP model had an accuracy of 0.304, while the most accurate haplotype model, 500kb (1% filter), had an accuracy of 0.308; both models had regression coefficients of yield on prediction that deviated from unity by 0.052 (i.e. bias). The 250kb haplotype model with 1% filter had an accuracy of 0.307 and bias of 0.045. These two haplotype models fit approximately twice the number of features as the SNP model and took much longer to run (24 vs. 13 hours). Using 125kb haplotypes (10% filter) afforded similar performance to the SNP model in terms of number of features, accuracy, bias and computation time. The 1-2Mb haplotypes were too long for this population, with decreased accuracy (0.208-0.300) and increased bias (0.066-0.175) compared to the SNP model. BayesB and BayesN were run using the 250kb (1% filter) and 125kb (10% filter) haplotypes but provided no improvement in accuracy or bias compared to BayesA when pi values were either 0.80 or 0.95, and fitting all haplotypes within a fitted window for BayesN. The runtime for the BayesB haplotype model (8-10 hours) was approximately half that of BayesA haplotype model (15-24 hours), bringing this in line with the runtime for the BayesA SNP model (13 hours), while the runtime for the BayesN haplotype model (28-39 hours) was almost double that for the BayesA haplotype model. Fitting fixed length haplotypes did not provide substantial improvement over fitting SNP genotypes for genomic prediction of milk fat yield in New Zealand dairy cattle.
Keywords:
Dairy, Genomic Prediction, Haplotypes