Verification of sequential and concurrent libraries

Deshmukh, Jyotirmoy Vinay

Verification of sequential and concurrent libraries

Access full-text files

DESHMUKH-DISSERTATION.pdf (1.23 MB)

Date

2010-08

Authors

Deshmukh, Jyotirmoy Vinay

Abstract

The goal of this dissertation is to present new and improved techniques for fully automatic verification of sequential and concurrent software libraries. In most cases, automatic software verification is plagued by undecidability, while in many others it suffers from prohibitively high computational complexity. Model checking -- a highly successful technique used for verifying finite state hardware circuits against logical specifications -- has been less widely adapted for software, as software verification tends to involve reasoning about potentially infinite state-spaces. Two of the biggest culprits responsible for making software model checking hard are heap-allocated data structures and concurrency. In the first part of this dissertation, we study the problem of verifying shape properties of sequential data structure libraries. Such libraries are implemented as collections of methods that manipulate the underlying data structure. Examples of such methods include: methods to insert, delete, and update data values of nodes in linked lists, binary trees, and directed acyclic graphs; methods to reverse linked lists; and methods to rotate balanced trees. Well-written methods are accompanied by documentation that specifies the observational behavior of these methods in terms of pre/post-conditions. A pre-condition [phi] for a method M characterizes the state of a data structure before the method acts on it, and the post-condition [psi] characterizes the state of the data structure after the method has terminated. In a certain sense, we can view the method as a function that operates on an input data structure, producing an output data structure. Examples of such pre/post-conditions include shape properties such as acyclicity, sorted-ness, tree-ness, reachability of particular data values, and reachability of pointer values, and data structure-specific properties such as: "no red node has a red child'', and "there is no node with data value 'a' in the data structure''. Moreover, methods are often expected not to violate certain safety properties such as the absence of dangling pointers, absence of null pointer dereferences, and absence of memory leaks. We often assume such specifications as implicit, and say that a method is incorrect if it violates such specifications. We model data structures as directed graphs, and use the two terms interchangeably. Verifying correctness of methods operating on graphs is an instance of the parameterized verification problem: for every input graph that satisfies [phi], we wish to ensure that the corresponding output graph satisfies [psi]. Control structures such as loops and recursion allow an arbitrary method to simulate a Turing Machine. Hence, the parameterized verification problem for arbitrary methods is undecidable. One of the main contributions of this dissertation is in identifying mathematical conditions on a programming language fragment for which parameterized verification is not only decidable, but also efficient from a complexity perspective. The decidable fragment we consider can be broadly sub-divided into two categories: the class of iterative methods, or methods which use loops as a control flow construct to traverse a data structure, and the class of recursive methods, or methods that use recursion to traverse the data structure. We show that for an iterative method operating on a directed graph, if we are guaranteed that if the number of destructive updates that a method performs is bounded (by a constant, i.e., O(1)), and is guaranteed to terminate, then the correctness of the method can be checked in time polynomial in the size of the method and its specifications. Further, we provide a well-defined syntactic fragment for recursive methods operating on tree-like data structures, which assures that any method in this fragment can be verified in time polynomial in the size of the method and its specifications. Our approach draws on the theory of tree automata, and we show that parameterized correctness can be reduced to emptiness of finite-state, nondeterministic tree automata that operate on infinite trees. We then leverage efficient algorithms for checking the emptiness of such tree automata to obtain a tractable verification framework. Our prototype tool demonstrates the low theoretical complexity of our technique by efficiently verifying common methods that operate on data structures. In the second part of the dissertation, we tackle another obstacle for tractable software verification: concurrency. In particular, we explore application of a static analysis technique based on interprocedural dataflow analysis to predict and document deadlocks in concurrent libraries, and analyze deadlocks in clients that use such libraries. The kind of deadlocks that we focus result from circular dependencies in the acquisition of shared resources (such as locks). Well-written applications that use several locks implicitly assume a certain partial order in which locks are acquired by threads. A cycle in the lock acquisition order is an indicator of a possible deadlock within the application. Methods in object-oriented concurrent libraries often encapsulate internal synchronization details. As a result of information hiding, clients calling the library methods may cause thread safety violations by invoking methods in a manner that violates the partial ordering between lock acquisitions that is implicit within the library. Given a concurrent library, we present a technique for inferring interface contracts that speciy permissible concurrent method calls and patterns of aliasing among method arguments that guarantee deadlock-free execution for the methods in the library. The contracts also help client developers by documenting required assumptions about the library methods. Alternatively, the contracts can be statically enforced in the client code to detect potential deadlocks in the client. Our technique combines static analysis with a symbolic encoding for tracking lock dependencies, allowing us to synthesize contracts using a satisfiability modulo theories (SMT) solver. Additionally, we investigate extensions of our technique to reason about deadlocks in libraries that employ signalling primitives such as wait-notify for cooperative synchronization. We demonstrate its scalability and efficiency with a prototype tool that analyzed over a million lines of code for some widely-used open-source Java libraries in less than 50 minutes. Furthermore, the contracts inferred by our approach have been able to pinpoint real bugs, i.e. deadlocks that have been reported by users of these libraries.