Offline debugging of distributed processes

Chin, Bryan Scott, 1968-
Journal Title
Journal ISSN
Volume Title

This thesis addresses the problem of debugging a distributed system. We define debugging as the process of diagnosing and correcting errors in a target application. In distributed systems, problems arise from the non-deterministic execution of distributed processes. Hence, we cannot take concepts directly from debuggers of sequential programs. We categorize debuggers as static, interactive, or post-mortem, according to the time at which they perform their analysis of the target application. This thesis focuses on offline debugging, a type of post-mortem debugger. We choose offline debugging because of its automated control, its ability to model and search the entire state space, and its reduced probe effect. An offline debugger consists of two components: the monitor and the offline debugger. The monitor individually observes each involved process as it executes and creates trace files. The second component, the offline debugger, arranges these local trace files into a global search space and searches it to detect whether certain predicate search expressions existed during the execution of the target application. In this implementation, we use four algorithms based on a depth-first traversal of the consistent lattice to detect the predicate search expressions. Finally, we present the C language implementation of our practical offline debugger. We conclude by noting that while the offline debugger is a powerful tool, a truly complete set of development tools should include more than one debugger to address different stages of the software lifecycle