14
Adventures in next generation sequencing of transcriptomes and genomes

Tuesday, March 17, 2015: 9:00 AM
304-305 (Community Choice Credit Union Convention Center)
Jeremy F Taylor , University of Missouri, Columbia, MO
Polyana C Tizioto , University of Missouri, Columbia, MO
Natalia V Grupioni , University of Missouri, Columbia, MO
JaeWoo Kim , University of Missouri, Columbia, MO
Jared E Decker , University of Missouri, Columbia, MO
Robert D Schnabel , University of Missouri, Columbia, MO
Abstract Text: Next generation sequencing (NGS) has become a powerful tool for the identification of a variety of classes of variation within the genome as well as for the characterization of gene expression, regulatory RNAs and epigenetic marks such as methylation. The cost of sequencing has decreased to the point that whole genomes can be sequenced to coverage depths of up to 30X for less than $5,000. As a consequence, at least 3,000 bovids have now been sequenced and many more samples have been analyzed for global transcriptome profiling. We have sequenced or traded 25.5 TB of whole  genome sequence on 476 animals from Dog (125), Water Buffalo (64), Bison (3) and 17 cattle breeds (284) and these are being used to identify Mendelian loci responsible for disease or reproductive failure (early embryonic lethals) as well as for the elucidation of the causal variants underlying large-effect QTL. The imputation of Illumina BovineSNP50 or BovineHD genotypes to full sequence variation can be accomplished with remarkable accuracy and breeds such as American Angus now have more than 60,000 animals genotyped with the BovineSNP50 assay. Imputation of these genotypes to full sequence enables the identification of moderate frequency variants that occur as homozygotes in frequencies much less than expected under Hardy Weinberg Equilibrium suggesting that they are likely to be lethal or severely deleterious. GWAS performed on sequence level data will likely result in the rapid identification of the variants that underlie large-effect QTL particularly when these variants segregate in multiple breeds and differences in the patterns of linkage disequilibrium across breeds can be used to differentiate between causal and associated variants. We have also generated 2.3 TB of RNA-Seq data on 153 animals to identify genes that are involved in the immune response to the pathogens responsible for bovine respiratory disease and those differentially expressed among animals that differ in feed efficiency. As a consequence, much of the work within the lab has evolved from wet-lab to computational activities and the ability to program in Perl or Python is requisite for a student's survival in a modern livestock genomics environment. Limitations to the utility of NGS data continue to be the inadequacy of the reference assemblies and their annotation particularly for the locations of regulatory elements including enhancers and repressors, the lack of reference assemblies for indicine or for other important taurine breeds and the lack of a high quality transcriptome including isoforms. These are high priorities for all livestock species and USDA NIFA funding must be allocated to quickly resolve these deficiencies. 

Keywords: Genome sequencing, Mendelian loci, Quantitative trait loci