Accuracy of RNA-Seq and its dependence on sequencing depth

dc.creatorCai, Guoshuaien
dc.creatorLi, Huaen
dc.creatorLu, Yueen
dc.creatorHuang, Xuelinen
dc.creatorLee, Juheeen
dc.creatorMuller, Peteren
dc.creatorJi, Yuanen
dc.creatorLiang, Shoudanen
dc.date.accessioned2014-12-15T17:11:08Zen
dc.date.available2014-12-15T17:11:08Zen
dc.date.issued2012-08-24en
dc.descriptionGuoshuai Cai and Shoudan Liang are with the Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA. -- Hua Li is with Department of Stem Cell Transplantation and Cellular Therapy, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA. -- Yue Lu is with the Department of Leukemia, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA. -- Xuelin Huang, Juhee Lee, and Yuan Ji are with the Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA. -- Peter Muller is with the Department of Mathematics, The University of Texas at Austin, Austin, Texas 78712, USA.en
dc.description.abstractBackground: The cost of DNA sequencing has undergone a dramatical reduction in the past decade. As a result, sequencing technologies have been increasingly applied to genomic research. RNA-Seq is becoming a common technique for surveying gene expression based on DNA sequencing. As it is not clear how increased sequencing capacity has affected measurement accuracy of mRNA, we sought to investigate that relationship. Result: We empirically evaluate the accuracy of repeated gene expression measurements using RNA-Seq. We identify library preparation steps prior to DNA sequencing as the main source of error in this process. Studying three datasets, we show that the accuracy indeed improves with the sequencing depth. However, the rate of improvement as a function of sequence reads is generally slower than predicted by the binomial distribution. We therefore used the beta-binomial distribution to model the overdispersion. The overdispersion parameters we introduced depend explicitly on the number of reads so that the resulting statistical uncertainty is consistent with the empirical data that measurement accuracy increases with the sequencing depth. The overdispersion parameters were determined by maximizing the likelihood. We shown that our modified beta-binomial model had lower false discovery rate than the binomial or the pure beta-binomial models. Conclusion: We proposed a novel form of overdispersion guaranteeing that the accuracy improves with sequencing depth. We demonstrated that the new form provides a better fit to the data.en
dc.description.catalogingnoteshoudan@mdanderson.orgen
dc.description.departmentMathematicsen
dc.description.sponsorshipen
dc.identifier.Filename1471-2105-13-S13-S5.pdfen
dc.identifier.citationCai, Guoshuai, Hua Li, Yue Lu, Xuelin Huang, Juhee Lee, Peter Müller, Yuan Ji, and Shoudan Liang. “Accuracy of RNA-Seq and Its Dependence on Sequencing Depth.” BMC Bioinformatics 13, no. Suppl 13 (August 24, 2012): S5. doi:10.1186/1471-2105-13-S13-S5.en
dc.identifier.doidoi:10.1186/1471-2105-13-S13-S5en
dc.identifier.urihttp://hdl.handle.net/2152/27983en
dc.language.isoEnglishen
dc.publisherBMC Bioinformaticsen
dc.rightsAdministrative deposit of works to UT Digital Repository: This works author(s) is or was a University faculty member, student or staff member; this article is already available through open access at http://www.biomedcentral.com. The public license is specified as CC-BY: http://creativecommons.org/licenses/by/4.0/. The library makes the deposit as a matter of fair use (for scholarly, educational, and research purposes), and to preserve the work and further secure public access to the works of the University.en
dc.subjectRNA-seqen
dc.subjectsequencing depthen
dc.subjectgenomic researchen
dc.subjectgene expressionen
dc.titleAccuracy of RNA-Seq and its dependence on sequencing depthen
dc.typeArticleen

Access full-text files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
CaiBMCBioinformatics2012.pdf
Size:
1.52 MB
Format:
Adobe Portable Document Format