Some abstracts do not have video files because ASAS was denied recording rights.

417
A review of sequencing and assembly methods that enhance computational use

Saturday, July 23, 2016: 4:20 PM
Grand Ballroom A (Salt Palace Convention Center)
Wesley C. Warren , McDonnell Genome Institute, Washington University School of Medicine, St Louis, MO
Abstract Text:

In essence high quality genome references are proven to be a necessity to enable research on so many levels of biological investigation including disease etiology, small molecule drug screening and interactions, canonical disease pathway manifestation, and so many others. To date very few genomes can be classified as near finished, defined as only missing small regions that are recalcitrant to known molecular biology methods. Ultimately our goal is to produce contiguous chromosomes for genomes de novo at the lowest cost. So far most published de novo genome assemblies are derived from deep coverage Illumina only sequencing, most often utilizing two popular but independent assembly algorithms, yet all are documented to be inadequate for numerous types of genetic investigation. During this surge of short reads genome assembly new long read sequencing technology arrived, albeit at considerable cost, ~6 fold higher than pure Illumina de novo assembly approaches. However, long reads, now averaging ~14kb in length, have transformed our ability to capture most chromosomes that compel us to fund these approaches to obtain higher quality. Our lab and others now routinely assemble human genomes with N50 contig lengths of 10Mb and up to 53Mb size contigs, contigs defined as uninterrupted consensus sequence. In our studies we have seen how an incomplete genome sequence was hindering studies designed to detect signatures of selection in the poultry industry, such as missing microchromosome sequence assignments and partial or completely missing gene models in the chicken. In the chicken, despite the use of older long read sequencing technology (average read length of 8kb), we observed an increase of ~180Mb in assembled size, added 1,920 new gene models and reduced gaps by 7-fold among ordered chromosomes. Given the intense interest in better genome reference models I will review the generally compartmentalized phases for producing high quality genome references and provide examples of analysis outcome.

Keywords: assembly, genome reference, long reads