Effective bug detection and localization using information retrieval

dc.contributor.advisorPerry, Dewayne E.
dc.contributor.committeeMemberKhurshid, Sarfraz
dc.contributor.committeeMemberJulien, Christine
dc.contributor.committeeMemberGligoric, Milos
dc.contributor.committeeMemberDevanbu, Premkumar
dc.contributor.committeeMemberLawall, Julia
dc.creatorSaha, Ripon Kumar
dc.creator.orcid0000-0001-8333-9656
dc.date.accessioned2016-09-06T18:36:01Z
dc.date.available2016-09-06T18:36:01Z
dc.date.issued2016-05
dc.date.submittedMay 2016
dc.date.updated2016-09-06T18:36:01Z
dc.description.abstractSoftware bugs pose a fundamental threat to the reliability of software systems, even in systems designed with the best software engineering (SE) teams using the best SE practices. Detecting bugs early and fixing them quickly are extremely important. However, they are very expensive and challenging, especially at-scale. While the sciences of bug detection (e.g., software testing) and localization via static and dynamic program analyses have been explored considerably, text-based Information Retrieval (IR) techniques for bug detection and localization are interesting and promising new approaches for these problems. One advantage of text-based approaches is that it can utilize a lot of (implicit) semantic information about a program’s functionality from the program text, which is almost impossible to extract using program analysis based techniques. This dissertation builds a deeper understanding of current bug triaging and fixing processes via mining software repositories, and introduces new techniques for effective bug detection and localization. The dissertation has three main parts. First, we perform a number of empirical studies to investigate the extent of and reasons for long lived bugs, their severities, and time spent in different phases of bug fixing process. We demonstrate that many bugs remain unfixed for inordinate period of time due to numerous reasons, including difficulties in detecting, localizing, and fixing them. Second, we demonstrate that developers use very similar program text in source code and their corresponding test cases, which could be utilized to implement powerful test prioritization techniques. We introduce a novel IR based regression test prioritization technique called REPiR that embodies our insight, and show that REPiR is more efficient than program analysis based or dynamic coverage based techniques. Third, we demonstrate that fine grained program text such as class names, method names, variable names, and comments carry different levels of information, and it can be utilized to improve IR based bug localization. We introduce a structured retrieval technique called BLUiR that embodies our insights and show that BLUiR outperforms the existing state-of-the-art IR-based bug localization approaches. Finally, we further improve BLUiR by natural language processing. We make four contributions in this dissertation. One, we provide empirical evidence that there are considerable numbers of non-trivial bugs in software projects that survive for a long time. We describe the reasons for delay in fixing, the nature of fixes, and overall fixing process of these long lived bugs in a great detail. Two, we introduce the notion of IR-based regression test prioritization based on program changes. Three, we introduce the notion of structured retrieval for bug localization. Four, we provide an in-depth analysis of the extent to which natural languages processing can play an important role in improving IR-based bug localization further. The central ideas are embodied in a suite of prototype tools. Rigorous empirical evaluation is performed to validate the efficacy of the proposed techniques using datasets containing a variety of real-world Java and C programs.
dc.description.departmentElectrical and Computer Engineering
dc.format.mimetypeapplication/pdf
dc.identifierdoi:10.15781/T2XP6V42Q
dc.identifier.urihttp://hdl.handle.net/2152/40245
dc.language.isoen
dc.subjectSoftware testing
dc.subjectAutomatic bug localization
dc.titleEffective bug detection and localization using information retrieval
dc.typeThesis
dc.type.materialtext
thesis.degree.departmentElectrical and Computer Engineering
thesis.degree.disciplineElectrical and Computer engineering
thesis.degree.grantorThe University of Texas at Austin
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy

Access full-text files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
SAHA-DISSERTATION-2016.pdf
Size:
1.5 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.84 KB
Format:
Plain Text
Description: