A mixed approach to spectrum-based fault localization using information theoretic foundations
Fault localization, i.e., locating faults in code, such as faulty statements or expressions, which are responsible for observed failures, is traditionally a manual, laborious, and tedious task. Recent years have seen much progress in automated techniques for fault localization. A particularly promising approach is to utilize program execution spectra to analyze passing and failing runs and compute how likely each statement is to be faulty. Techniques based on this approach have so far largely focused on either using statistical analysis or similarity-based measures, which have a natural application in evaluating such runs. However, in spite of some initial success, the current techniques lack the effectiveness of localizing the faults with a high degree of confidence in real applications.
Our thesis is that information theoretic feature selection can provide a basis for novel techniques that mix coverage of different program elements for improving the effectiveness of fault localization using program spectra. Our basic insight is that each additional failing or passing run can increase the information diversity with respect to the program elements, which can help localize faults in code. For example, the statements with maximum feature diversity information can point to the most suspicious lines of code. This dissertation presents a new fault localization approach that embodies our insight and introduces Bernoulli divergence for feature selection and uses it as the foundation for two novel techniques: (1) mixing of branch and statement coverage information; and (2) varying of feature granularity from function-level to statement-level. An experimental evaluation using a suite of subject programs commonly used in evaluation of fault localization techniques shows that our approach provides an effective basis for fault localization.