Some abstracts do not have video files because ASAS was denied recording rights.

302
Identifying and calling insertions, deletions, and single-base mutations efficiently from sequence data

Thursday, July 21, 2016: 9:30 AM
Grand Ballroom I (Salt Palace Convention Center)
Paul M. VanRaden , Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, Beltsville, MD
Derek M. Bickhart , Animal Genomics and Improvement Laboratory, USDA-ARS, Beltsville, MD
Jeffrey R O'Connell , University of Maryland School of Medicine, Baltimore, MD
Abstract Text:

Whole-genome sequencing studies can identify causative mutations for subsequent use in genomic evaluations, but sequence alignment and variant identification are lengthy and sometimes inaccurate processes. Speed and accuracy of identifying small insertions and deletions (indels) of sequence can be improved by calling variants while aligning sequence reads. Previous algorithms separated alignment and calling steps, whereas program findmap stores previously known variants in memory, calls alleles for those variants, and identifies other potential new variants during alignment. The algorithm uses a string-pattern hash to store the reference genome in a rapidly accessed table. If both ends of a paired-end read do not align fully, the length of a potential indel within the read is calculated from the map location difference for 2 partial matches. The algorithm then finds the indel location and checks if the full read matches after accounting for the indel. Potential variants detected by findmap are checked and edited by program findvar for consistency across reads. New variants from findvar were compared with those from the Genome Analysis Toolkit (GATK) UnifiedGenotyper and from SamTools after Burrows-Wheeler Aligner (BWA) alignment. Detection accuracy was examined using reads simulated for 10 animals at 10X coverage from cattle reference map UMD3.1 with variants derived from run 5 (July 2015) of the 1,000 bull genomes project that included 38,062,190 SNP, 532,179 insertions, and 1,127,620 deletions. Half of variants were simulated as heterozygous, one-fourth as homozygous alternate, and one-fourth as homozygous reference. For homozygous alternate variants, findvar found 99.8% of SNP, 79% of insertions, and 67% of deletions; GATK found 99.4, 90, and 89%; and SamTools found 99.8, 12, and 18%, respectively. For heterozygotes, findvar found 99.1, 75, and 62%; GATK found 99.0, 90, and 88%; and SamTools found 98.2, 9, and 11%, respectively. False positives as percentages of true variants were 14, 0.4, and 0.3% from findvar; 12, 8.4, and 2.9% from GATK; and 37, 1.3, and 0.4% from SamTools, respectively. Read depth was 85.9 from findmap/findvar, 96.1 from BWA/GATK, and 84.4 from BWA/SamTools. With 10 processors, clock times were 106 h for BWA, 25 h for GATK, 11 h for SamTools, 3 h for findmap, and 1 h for findvar. The new software is freely available, with algorithms 10 to 30 times faster than current strategies for calling known and identifying new variants. Accuracy is improved by accounting for DNA variants while aligning sequence data.

Keywords: sequence alignment, variant calling, indel