This is a draft schedule. Presentation dates, times and locations may be subject to change.
188
Ability to Genotype Differing Variants with Arrays Vs. Whole Genome Sequencing
Ability to Genotype Differing Variants with Arrays Vs. Whole Genome Sequencing
Tuesday, July 11, 2017: 11:00 AM
319 (Baltimore Convention Center)
Whole genome sequencing has identified millions of new variants, but many (about 35% in our experience) of the single nucleotide polymorphisms (SNPs) may not produce high quality genotypes from microarrays. Properties of SNPs can help predict which will pass or fail when designing arrays such as the customized version of Illumina’s Bovine LD chip examined here. Genotypes for 26,970 reference bulls were imputed using 440 sequenced Holsteins from run 5 of the 1000 Bull Genomes Project, and 4,821 SNPs with largest effects for net merit were selected. When adding those to the Zoetis LD chip (version 5), the success rate was 96% for 3,220 SNPs from the Bovine HD chip, but only 64% for 1,601 new sequence SNPs not previously on any chip. To determine why SNPs failed, a pass/fail (1/0) indicator of sequence SNP conversion success was correlated with 1) Illumina design scores, 2) estimated heritabilities of the genotypes for 3,000 randomly selected bulls, and 3) the base distance that the SNP was inside a repetitive DNA segment as determined by RepeatMasker, using a minimum distance of 0 if outside a repeat and maximum of 50 bases if inside. The correlations were 0.51 for design scores, 0.14 for estimated heritabilities, and -0.15 for repeat distance. All three were highly significant (P < 0.0001), but repeat distance was less significant (P = 0.04) after fitting design score and heritability in multiple regression. Three other factors (minor allele frequency, SNP position with genes, and the reference/alternate allele combination pattern) were not associated with conversion success. In a reverse test, 56,815 SNPs from the Bovine 50K version 1 chip were matched with 38 million sequence SNPs. Previously 15,772 of the 50K SNPs had been declared not usable, and 11,969 (87%) of those were also either not identified or removed by sequence edits. However, 3,803 (9%) of the 43,053 currently used SNPs that produce high quality genotypes on the 50K chip were absent from the sequence data, and the absence was not associated with minor allele frequency or allele combination. If the goal is to select the best SNP subset for a chip, design scores could be pre-computed and examined before rather than after estimating SNP effects, allowing selection of other linked SNPs expected to perform better. Eventually targeted sequencing could provide genotypes for important SNPs that fail to convert, because many SNPs from sequence data are difficult to genotype using arrays.