31
Satistical methods for eQTL mapping using RNA-seq data

Wednesday, March 18, 2015: 9:15 AM
312-313 (Community Choice Credit Union Convention Center)
Deborah Velez-Irizarry , Department of Animal Science, Michigan State University, East Lansing, MI
Catherine W. Ernst , Department of Animal Science, Michigan State University, East Lansing, MI
Ronald O. Bates , Department of Animal Science, Michigan State University, East Lansing, MI
Pablo D Reeb , Department of Animal Science, Michigan State University, East Lansing, MI
Yeni Liliana Bernal Rubio , Department of Animal Science, Michigan State University, East Lansing, MI
Nancy E. Raney , Department of Animal Science, Michigan State University, East Lansing, MI
Juan P. Steibel , Department of Animal Science, Michigan State University, East Lansing, MI
Abstract Text:

Mapping expression quantitative trait loci (eQTL) provides insight into gene expression regulation. When RNA-Seq data is used to fit eQTL, proper statistical analysis requires addressing issues such as: (1) Shrinking variance component estimates to increase power of eQTL detection in small samples, and (2) Accounting for population structure to avoid spurious associations. The goal of this research is to propose statistical models for eQTL analysis with application to crosses of outbred livestock populations. We used longissimus dorsi muscle RNA-Seq and SNP genotype data for 24 female pigs from the F2 generation of the MSU Duroc x Pietrain population. We compared two analysis models. The first model is GBLUP-based GWA, which fits all markers simultaneously, one transcript at a time. The advantages of this model are that it can pre-screen transcripts by their heritability and account for population substructure through the genomic relationship matrix. Its disadvantage is that fitting one transcript at a time does not result in borrowing information across genes. An alternative is to fit a differential expression model implemented in the package LIMMA. LIMMA fits markers one at a time for all transcripts simultaneously. This provides the advantage of shrinking variance components and borrowing information across genes, at the price of not being able to model random effects, thus, not easily accounting for population stratification. To overcome this limitation we propose fitting k=4 principal components (PC) of the relationship matrix as fixed effects, accounting for 34% of relationship matrix variation. With GBLUP-based GWA, the h2was not significantly different from 0 for any transcript after correcting for multiple testing, thus there were no eQTL detected. We attribute this to a flat likelihood surface due to small sample size. On the other extreme, the LIMMA model detected 33,000 eQTL (qvalue<0.1) if PC were excluded, and 4,000 eQTL (qvalue<0.1) when PC were fit as fixed effects. Similarly, the number of putative hotspots detected (markers associated with more than 1000 transcripts) reduced from 14 to 4 when PC were fit. We hypothesize that the difference is due to an excess of false positives when population structure is ignored. To test this hypothesis we performed permutation of phenotypes with respect to genotypes and confirmed that including PC reduced the number of false positive eQTL. We conclude that LIMMA had more power for eQTL detection than GBLUP, but population structure needs to be accounted for using PC.

Keywords: eQTL, RNA-Seq, pigs