BM-BC: a Bayesian method of base calling for Solexa sequence data
MetadataShow full item record
Base calling is a critical step in the Solexa next-generation sequencing procedure. It compares the position-specific intensity measurements that reflect the signal strength of four possible bases (A, C, G, T) at each genomic position, and outputs estimates of the true sequences for short reads of DNA or RNA. We present a Bayesian method of base calling, BM-BC, for Solexa-GA sequencing data. The Bayesian method builds on a hierarchical model that accounts for three sources of noise in the data, which are known to affect the accuracy of the base calls: fading, phasing, and cross-talk between channels. We show that the new method improves the precision of base calling compared with currently leading methods. Furthermore, the proposed method provides a probability score that measures the confidence of each base call. This probability score can be used to estimate the false discovery rate of the base calling or to rank the precision of the estimated DNA sequences, which in turn can be useful for downstream analysis such as sequence alignment.
Yuan Ji is with the Center for Clinical and Research Informatics, Northshore University HealthSystem, Evanston, IL 60091, USA. -- Riten Mitra is with ICES, University of Texas at Austin, Austin, TX 78705, USA. -- Fernando Quintana and Alejandro Jara are with the Department of Statistics, Pontificia Universidad Católica de Chile, Casilla 306, Correo 22, Santiago, Chile. -- Peter Muller is with the Department of Mathematics, The University of Texas at Austin, Austin, TX 78705, USA. -- Ping Liu is with Abbott Molecular Inc., Des Plaines, IL 60018, USA. -- Yue Lu is with the Department of Leukamia, The University of Texas, M. D. Anderson Cancer Center, Houston, TX 77030, USA. -- Shoudan Liang is with the Department of Bioinformatics and Computational Biology, The University of Texas, M. D. Anderson Cancer Center, Houston, TX 77030, USA.