Probabilistic Alignment Leads to Improved Accuracy and Read Coverage for Bisulfite Sequencing Data

dc.contributor.utaustinauthorClement, Nathan L.en_US
dc.creatorHong, Changjinen_US
dc.creatorClement, Nathan L.en_US
dc.creatorClement, Spenceren_US
dc.creatorHammoud, Saher Sueen_US
dc.creatorCarrell, Douglas T.en_US
dc.creatorCairns, Bradley R.en_US
dc.creatorSnell, Quinnen_US
dc.creatorClement, Mark J.en_US
dc.creatorJohnson, William Evanen_US
dc.date.accessioned2016-10-28T19:50:14Z
dc.date.available2016-10-28T19:50:14Z
dc.date.issued2013-11en_US
dc.description.abstractDNA methylation has been linked to many important biological phenomena. Researchers have recently begun to sequence bisulfite treated DNA to determine its pattern of methylation. However, sequencing reads from bisulfite-converted DNA can vary significantly from the reference genome because of incomplete bisulfite conversion, genome variation, sequencing errors, and poor quality bases. Therefore, it is often difficult to align reads to the correct locations in the reference genome. Furthermore, bisulfite sequencing experiments have the additional complexity of having to estimate the DNA methylation levels within the sample. Results: Here, we present a highly accurate probabilistic algorithm, which is an extension of the Genomic Next-generation Universal MAPper to accommodate bisulfite sequencing data (GNUMAP-bs), that addresses the computational problems associated with aligning bisulfite sequencing data to a reference genome. GNUMAP-bs integrates uncertainty from read and mapping qualities to help resolve the difference between poor quality bases and the ambiguity inherent in bisulfite conversion. We tested GNUMAP-bs and other commonly-used bisulfite alignment methods using both simulated and real bisulfite reads and found that GNUMAP-bs and other dynamic programming methods were more accurate than the more heuristic methods. Conclusions: The GNUMAP-bs aligner is a highly accurate alignment approach for processing the data from bisulfite sequencing experiments. The GNUMAP-bs algorithm is freely available for download at: http://dna.cs.byu.edu/gnumap. The software runs on multiple threads and multiple processors to increase the alignment speed.en_US
dc.description.departmentComputer Sciencesen_US
dc.description.sponsorshipNational Institutes of Health (NIH) R01 HG005692en_US
dc.identifierdoi:10.15781/T2VH5CM5X
dc.identifier.citationHong, Changjin, Nathan L. Clement, Spencer Clement, Saher Sue Hammoud, Douglas T. Carrell, Bradley R. Cairns, Quinn Snell, Mark J. Clement, and William Evan Johnson. "Probabilistic alignment leads to improved accuracy and read coverage for bisulfite sequencing data." BMC bioinformatics, Vol. 14, No. 1 (Nov., 2013): 1.en_US
dc.identifier.doi10.1186/1471-2105-14-337en_US
dc.identifier.issn1471-2105en_US
dc.identifier.urihttp://hdl.handle.net/2152/43185
dc.language.isoEnglishen_US
dc.relation.ispartofen_US
dc.relation.ispartofserialBMC Bioinformaticsen_US
dc.rightsAdministrative deposit of works to Texas ScholarWorks: This works author(s) is or was a University faculty member, student or staff member; this article is already available through open access or the publisher allows a PDF version of the article to be freely posted online. The library makes the deposit as a matter of fair use (for scholarly, educational, and research purposes), and to preserve the work and further secure public access to the works of the University.en_US
dc.rights.restrictionOpenen_US
dc.subjectdna methylationen_US
dc.subjectbisulfite sequencingen_US
dc.subjectprobabilistic alignmenten_US
dc.subjectparallelen_US
dc.subjectprocessingen_US
dc.subjectnon-cpg methylationen_US
dc.subjectdna methylationen_US
dc.subjectgeneen_US
dc.subjectarabidopsisen_US
dc.subjectgenerationen_US
dc.subjectepigenomeen_US
dc.subjectcellsen_US
dc.subjectbiochemical research methodsen_US
dc.subjectbiotechnology & applied microbiologyen_US
dc.subjectmathematical & computational biologyen_US
dc.titleProbabilistic Alignment Leads to Improved Accuracy and Read Coverage for Bisulfite Sequencing Dataen_US
dc.typeArticleen_US

Access full-text files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2013_11_Hong.pdf
Size:
532.28 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.65 KB
Format:
Plain Text
Description: