25
Comparison of a QTL versus marker effects model for genomic prediction with training across families, generations, or breeds
Accurate prediction of genetic merit of selection candidates using dense marker genotypes requires a large training dataset. In practice, the number of training individuals from the same contemporary group as the selection candidates is usually limited. Thus, genotypic and phenotypic records from other contemporary groups such as other families, generations or breeds are pooled together to increase the training size. However, association signals between contemporary groups may not be consistent due to different linkage disequilibrium (LD) patterns in founders and different co-segregation patterns in non-founders. Marker effects models such as GBLUP and BayesC do not account for the heterogeneous association signals across contemporary groups. Therefore, accuracy of prediction may not improve with increasing training size. In this study, a QTL effects model is developed to accurately model the heterogeneous association patterns across contemporary groups, where putative QTL are assumed in every centi-morgan chromosomal segment. LD is modeled through the conditional gene frequency at the putative QTL given the surrounding marker haplotypes in founders, which is assumed to be the same for all contemporary groups from a breed. Co-segregation is modeled by tracing the inheritance of the putative QTL alleles along with the surrounding marker haplotypes in non-founders. The marker and QTL models were compared using simulated training populations consisting of multiple contemporary groups that went across families, generations, or breeds with candidates from a single contemporary group that was not included in training. The simulated genome had one 1-morgan chromosome with 20 QTL and 20,000 markers that were segregating in the founders. Three scenarios were simulated, where QTL and markers were in strong LD, weak LD, or linkage equilibrium (LE) in the founders, while markers were always in LD. In the strong LD scenario, BayesC and the QTL model had similar accuracy, which was up to 11.0% higher than GBLUP with standard error (s.e.) < 0.8% across 16 replicates, as the number of contemporary groups for training increased. In the weak LD scenario, the QTL model had accuracy up to 12.9% (s.e. < 0.8%) higher than BayesC and up to 17.7% (s.e. < 0.9%) higher than GBLUP. In the scenario with LE between QTL and markers, accuracy from BayesC and GBLUP diminished with increasing number of families or breeds, whereas accuracy from the QTL model was persistent and up to 39.7% higher than BayesC and up to 40.5% higher than GBLUP (s.e. < 1.7% for both). In conclusion, the QTL model had higher accuracy of prediction when training population consisted of multiple contemporary groups.
Keywords: Genomic prediction, QTL effects model, combined training dataset