668
Comparison of variant calling methods for whole genome sequencing data in dairy cattle
Accurate identification of SNPs from next-generation sequencing data is crucial for high-quality downstream analysis. Whole genome sequence data of 65 key ancestors of genotyped Swiss dairy populations were available for investigation (24 billion reads, 96.8% mapped to UMD31, 12x coverage). Four publically available variant calling programmes were assessed and different levels of pre-calling handling for each method were tested and compared. SNP concordance was examined with Illumina’s BovineHD Genotyping BeadChip. Depending on variant calling software used, between 16,894,054 and 22,048,382 SNP were identified (multi-sample calling). A total of 14,644,310 SNP were identified by all four variant callers (multi-sample calling). InDel counts ranged from 1,997,791 to 2,857,754; 1,708,649 InDels were identified by all four variant callers. A minimum of pre-calling data handling resulted in the highest non-reference sensitivity and the lowest non-reference discrepancy rate.
Keywords:
dairy cattle
variant calling
next generation sequencing