[Archived] SNP Discovery in C. briggsae (build 3)
A major aim of the C. briggsae genetic map consortium is to develop a reliable set of single nucleotide polymorphisms (SNPs) for the organism. The current
release contains 42,730 SNPs from the "HK104" mapping strain.
Build Numbers for SNP Releases
For clarity and tracking purposes, build numbers have been initiated for releases of C. briggsae SNPs. Two sets of SNPs that were previously released were labeled
retroactively; both used the set of 13,632 HK104 sequence traces. "HK104 build 1" SNPs were identified by extracting substitution polymorphisms from CrossMatch
output with quality scores of at least 30. In the "HK104 build 2" release I added SNPs discovered by older builds of Polyphred and Ssaha-SNP (the latter results
provided by Jim Mullikin). Both of these releases used a supercontig-based reference genome (AF16) from GSC/Wormbase (cb25/agp8) that I modified using some updates
from Sanger (update.041011).
There were no previous releases for VT847 strain SNPs, but the same build number as HK104 will be used to avoid avoid confusion.
Current Release (build 3) from SSAHA-SNP Results
In the current release, build 3, SNP discovery was performed on shotgun sequence traces of strains HK104 and VT847. In this round only the ssahaSNP
program (SSAHA2) was used, and found to be robust, efficient, and user-friendly. Unfortunately Polyphred
(v5.04) and Polybayes (v3.0) were unable to run efficiently
when the entire read set and reference genome sequences were provided as input. The reference genome used for SNP discovery was obtained from Wormbase
(cb25/agp8) which is organized by ultra (fingerprint) contig.
The flanking sequences for build 3 SNPs were repeat-masked to lower case by RepeatMasker with a customized
C. briggsae repeat library.
It should be cautioned that nearby SNPs have NOT yet been marked in the flanking sequences for this build.
SNP Discovery Results (build 3)
|Sequence traces examined
|Unique SNP loci detected
|SNPs in repetitive regions
|SNPs in homopolymer runs
Processing and Integration of C. briggsae HK104 SNPs
Flanking sequences from the 25,317 HK104 build 2 SNPs
were used to merge as many as possible with the build 3 set. Despite the differences in methods and the
reference genome, 11,762 SNPs from build 2 are included in build 3 using their original SNP identifiers ("cbXXXXX"). The remaining SNPs in build 3 were assigned
new SNP ID's starting at cb40000. Next, the full set of HK104 build 3 SNPs was integrated with the recombination map by cross-referencing them
with the C. briggsae Genetic Map (v3.1)
yielded 22,511 whose ultracontigs were included in the genetic map. The chromosome and genetic distance(s)
for the ultracontig are provided with each SNP. The remaining 20,219 SNPs on ultracontigs that have not been genetically mapped were labeled with chromosome
"CbUn" and a zero value for genetic distance.
Download:   C. briggsae HK104 SNPs (build 3)