Base Calling for High-Throughput Short-Read Sequencing: Dynamic Programming Solutions

dc.contributor.utaustinauthorDas, Shreepriyaen_US
dc.contributor.utaustinauthorVikalo, Harisen_US
dc.creatorDas, Shreepriyaen_US
dc.creatorVikalo, Harisen_US
dc.description.abstractNext-generation DNA sequencing platforms are capable of generating millions of reads in a matter of days at rapidly reducing costs. Despite its proliferation and technological improvements, the performance of next-generation sequencing remains adversely affected by the imperfections in the underlying biochemical and signal acquisition procedures. To this end, various techniques, including statistical methods, are used to improve read lengths and accuracy of these systems. Development of high performing base calling algorithms that are computationally efficient and scalable is an ongoing challenge. Results: We develop model-based statistical methods for fast and accurate base calling in Illumina's next-generation sequencing platforms. In particular, we propose a computationally tractable parametric model which enables dynamic programming formulation of the base calling problem. Forward-backward and soft-output Viterbi algorithms are developed, and their performance and complexity are investigated and compared with the existing state-of-the-art base calling methods for this platform. A C code implementation of our algorithm named Softy can be downloaded from Conclusion: We demonstrate high accuracy and speed of the proposed methods on reads obtained using Illumina's Genome Analyzer II and HiSeq2000. In addition to performing reliable and fast base calling, the developed algorithms enable incorporation of prior knowledge which can be utilized for parameter estimation and is potentially beneficial in various downstream applications.en_US
dc.description.departmentElectrical and Computer Engineeringen_US
dc.description.sponsorshipNational Institute of Health 1R21HG006171-01en_US
dc.identifier.citationDas, Shreepriya, and Haris Vikalo. "Base calling for high-throughput short-read sequencing: dynamic programming solutions." BMC bioinformatics, Vol. 14, No. 1 (Apr., 2013): 129.en_US
dc.relation.ispartofserialBMC Bioinformaticsen_US
dc.rightsAdministrative deposit of works to Texas ScholarWorks: This works author(s) is or was a University faculty member, student or staff member; this article is already available through open access or the publisher allows a PDF version of the article to be freely posted online. The library makes the deposit as a matter of fair use (for scholarly, educational, and research purposes), and to preserve the work and further secure public access to the works of the University.en_US
dc.subjectbiochemical research methodsen_US
dc.subjectbiotechnology & applied microbiologyen_US
dc.subjectmathematical & computational biologyen_US
dc.titleBase Calling for High-Throughput Short-Read Sequencing: Dynamic Programming Solutionsen_US

Access full-text files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
681.58 KB
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
1.65 KB
Plain Text