206
Using Random Forests (RF) To Prescreen Candidate Genes: A New Prospective for GWAS

Tuesday, August 19, 2014: 10:30 AM
Bayshore Grand Ballroom A (The Westin Bayshore)
Yutao Li , CSIRO Animal, Food and Health Sciences, Brisbane, Australia
James Kijas , CSIRO Animal, Food and Health Sciences, Brisbane, Australia
John M Henshall , Food Futures Flagship, CSIRO Animal, Food and Health Sciences, Armidale, Australia
Sigrid A Lehnert , CSIRO Food Futures Flagship, Brisbane, Australia
Russell McCulloch , Food Futures Flagship, CSIRO Animal, Food and Health Sciences, Brisbane, Australia
Anthony Reverter-Gomez , Food Futures Flagship, CSIRO Animal, Food and Health Sciences, Brisbane, Australia
Abstract Text: High-throughput genomic data present an enormous challenge to researchers, due to the “large P small N” problem. Recently a machine learning method, Random Forests (RF), has gained the popularity in addressing these problems. In this study, we examined the utility of RF in two livestock genome-wide association study (GWAS) datasets - a Spanish sheep pigmentation data and a tropical cattle pregnancy status data. The comparison of top 10 ranking SNPs identified by RF to single-marker GWAS methods found that: 1) RF confirmed the most strongly associated SNP (s26449) being the closest to the sheep pigmentation gene MCR1; 2) Five out of the top 10 SNPs identified by RF were close to the genes previously reported to link with reproductive performance in human or other species. The results indicate that RF can potentially be used in GWAS as an initial screening tool for candidate genes.

Keywords:

Random Forests

GWAS