Browsing by Subject "Concurrency"

Now showing 1 - 9 of 9

Asynchronous automatic-signal monitors with multi-object synchronization
(2016-12) Hung, Wei-Lun; Garg, Vijay K. (Vijay Kumar), 1963-; Julien, Christine; Khurshid, Sarfraz; Mittal, Neeraj; Perry, Dewayne E; Pingali, Keshav
One of the fundamental problems in parallel programming is that there is no simple programming paradigm that provides mutual exclusion and synchronization with efficient implementation at the same time. For monitor [Hoa74,Han75] (lock-based) systems, only experienced programmers can develop high-performance fine-grained lock-based implementations. Programmers frequently introduce bugs with traditional monitors. Researchers have proposed transactional memory [HM93,ST95], which provides a simple and elegant mechanism for programmers to atomically execute a set of memory operations so that there is no deadlock in transactional memory systems. However, most of transactional memory systems lack conditional synchronization support [WLS14,LW14]. Hence, writing multi-threaded programs with conditional synchronization is rather difficult. In this dissertation, we develop a parallel programming framework that provides simple constructs for mutual exclusion and synchronization as well as efficient implementation.
Computational process networks : a model and framework for high-throughput signal processing
(2011-05) Allen, Gregory Eugene; Evans, Brian L. (Brian Lawrence), 1965-; Browne, James C.; Chase, Craig M.; John, Lizy K.; Loeffler, Charles M.
Many signal and image processing systems for high-throughput, high-performance applications require concurrent implementations in order to realize desired performance. Developing software for concurrent systems is widely acknowledged to be difficult, with common industry practice leaving the burden of preventing concurrency problems on the programmer. The Kahn Process Network model provides the mathematically provable property of determinism of a program result regardless of the execution order of its processes, including concurrent execution. This model is also natural for describing streams of data samples in a signal processing system, where processes transform streams from one data type to another. However, a Kahn Process Network may require infinite memory to execute. I present the dynamic distributed deadlock detection and resolution (D4R) algorithm, which permits execution of Process Networks in bounded memory if it is possible. It detects local deadlocks in a Process Network, determines whether the deadlock can be resolved and, if so, identifies the process that must take action to resolve the deadlock. I propose the Computational Process Network (CPN) model which is based on the formalisms of Kahn’s PN model, but with enhancements that are designed to make it efficiently implementable. These enhancements include multi-token transactions to reduce execution overhead, multi-channel queues for multi-dimensional synchronous data, zero-copy semantics, and consumer and producer firing thresholds for queues. Firing thresholds enable memoryless computation of sliding window algorithms, which are common in signal processing systems. I show that the Computational Process Network model preserves the formal properties of Process Networks, while reducing the operations required to implement sliding window algorithms on continuous streams of data. I also present a high-throughput software framework that implements the Computational Process Network model using C++, and which maps naturally onto distributed targets. This framework uses POSIX threads, and can exploit parallelism in both multi-core and distributed systems. Finally, I present case studies to exercise this framework and demonstrate its performance and utility. The final case study is a three-dimensional circular convolution sonar beamformer and replica correlator, which demonstrates the high throughput and scalability of a real-time signal processing algorithm using the CPN model and framework.
Fast error detection with coverage guarantees for concurrent software
(2013-05) Coons, Katherine Elizabeth; McKinley, Kathryn S.
Concurrency errors are notoriously difficult to debug because they may occur only under unexpected thread interleavings that are difficult to identify and reproduce. These errors are increasingly important as recent hardware trends compel developers to write more concurrent software and to provide more concurrent abstractions. This thesis presents algorithms that dynamically and systematically explore a program's thread interleavings to manifest concurrency bugs quickly and reproducibly, and to provide precise incremental coverage guarantees. Dynamic concurrency testing tools should provide (1) fast response -- bugs should manifest quickly if they exist, (2) reproducibility -- bugs should be easy to reproduce and (3) coverage -- precise correctness guarantees when no bugs manifest. In practice, most tools provide either fast response or coverage, but not both. These goals conflict because a program's thread interleavings exhibit exponential state- space explosion, which inhibits fast response. Two approaches from prior work alleviate state-space explosion. (1) Partial-order reduction provides full coverage by exploring only one interleaving of independent transitions. (2) Bounded search provides bounded coverage by enumerating only interleavings that do not exceed a bound. Bounded search can additionally provide guarantees for cyclic state spaces for which dynamic partial-order reduction provides no guarantees. Without partial-order reduction, however, bounded search wastes most of its time exploring executions that reorder only independent transitions. Fast response with coverage guarantees requires both approaches, but prior work failed to combine them soundly. We combine bounded search with partial-order reduction and extensively analyze the space of dynamic, bounded partial-order reduction strategies. First, we prioritize with a best-first search and show that heuristics that combine these approaches find bugs quickly. Second, we restrict partial-order reduction to combine approaches while maintaining bounded coverage. We specialize this approach for several bound functions, prove that these algorithms guarantee bounded coverage, and leverage dynamic information to further reduce the state space. Finally, we bound the partial order on a program's transitions, rather than the total order on those transitions, to combine these approaches without sacrificing partial-order reduction. This algorithm provides fast response, incremental coverage guarantees, and reproducibility. We manifest bugs an order of magnitude more quickly than previous approaches and guarantee incremental coverage in minutes or hours rather than weeks, helping developers find and reproduce concurrency errors. This thesis makes bounded stateless model checking for concurrent programs substantially more efficient and practical.
Framework for testing Java concurrency
(2010-12) Heidt, David Patrick; Garg, Vijay K. (Vijay Kumar), 1963-; Krasner, Herb
Concurrent programming has become ubiquitous in the arena of application development, requiring most production quality systems to deal with at least some degree of multi-threaded execution. An increasing level of maturity is developing around the impact of concurrency on the design and testing processes. Much of this knowledge focuses on the functional aspect of the design and execution with success measures typically related to the correctness of a program. However, there exists a gap in the research to date around the process for concurrent performance testing. While many companies acknowledge that performance is a major source of complaints in production environments, performance testing historically receives low priority and is often little more than an extension of the functional testing. Possibly the most widely discussed and understood implementation language today, in terms of multi-threaded programming, is Java. The report outlines a standard framework for concurrent performance testing targeted towards Java based applications. In an effort to vet the framework, we execute a series of practical concurrent testing that address some of the most common aspects of concurrent programming in Java, with a particular focus on the Java Concurrency package. As a result, this report presents a portable, extensible framework that designers can use in evaluating the range of concurrency options available in Java within their particular environment. Additionally, it provides specific insight into the performance of these options in a typical run-time environment. This includes particular attention to the comparison of traditional lock based approach to non-blocking algorithms.
Implicitly distributing pervasively concurrent programs
(2020-12-10) Thywissen, John Adam; Rossbach, Christopher J.; Cook, William Randall; Misra, Jayadev; Peter, Simon; Gligoric, Milos
Distributed programs are often written as a collection of communicating modules. For example, to use Java RMI, programs are divided into objects which can be remotely referenced. Yet, in many cases, it would be desirable to write the program without the program structure being driven by distribution decisions. If distribution is decoupled from program structure, the programming language must allow communication throughout a program’s structure, instead of at a few known points. This situation is simplified if the programming language provides a uniform programming model for local and remote values (location transparency). We present Distributed Orc, which offers location transparency, and distributes program operations automatically in cooperation with the execution environment. By eliminating any special semantics of remote values, Distributed Orc enables programmers to write cohesive distributed programs, rather than programs artificially divided at distribution boundaries. Distributed Orc is derived from the Orc language, a (centralized) concurrent orchestration language.
Operating system transactions
(2010-12) Porter, Donald E.; Witchel, Emmett; Alvisi, Lorenzo; McKinley, Kathryn S.; Shmatikov, Vitaly; Swift, Michael
Applications must be able to synchronize accesses to operating system (OS) resources in order to ensure correctness in the face of concurrency and system failures. This thesis proposes system transactions, with which the programmer specifies atomic updates to heterogeneous system resources and the OS guarantees atomicity, consistency, isolation, and durability (ACID). This thesis provides a model for system transactions as a concurrency control mechanism. System transactions efficiently and cleanly solve long-standing concurrency problems that are difficult to address with other techniques. For example, malicious users can exploit race conditions between distinct system calls in privileged applications, gaining administrative access to a system. Programmers can eliminate these vulnerabilities by eliminating these race conditions with system transactions. Similarly, failed software installations can leave a system unusable. System transactions can roll back an unsuccessful software installation without disturbing concurrent, independent updates to the file system. This thesis describes the design and implementation of TxOS, a variant of Linux 2.6.22 that implements system transactions. The thesis contributes new implementation techniques that yield fast, serializable transactions with strong isolation and fairness between system transactions and non-transactional activity. Using system transactions, programmers can build applications with better performance or stronger correctness guarantees from simpler code. For instance, wrapping an installation of OpenSSH in a system transaction guarantees that a failed installation will be rolled back completely. These atomicity properties are provided by the OS, requiring no modification to the installer itself and adding only 10% performance overhead. The prototype implementation of system transactions also minimizes non-transactional overheads. For instance, a non-transactional compilation of Linux incurs negligible (less than 2%) overhead on TxOS. Finally, this thesis describes a new lock-free linked list algorithm, called OLF, for optimistic, lock-free lists. OLF addresses key limitations of prior algorithms, which sacrifice functionality for performance. Prior lock-free list algorithms can safely insert or delete a single item, but cannot atomically compose multiple operations (e.g., atomically move an item between two lists). OLF provides both arbitrary composition of list operations as well as performance scalability close to previous lock-free list designs. OLF also removes previous requirements for dynamic memory allocation and garbage collection of list nodes, making it suitable for low-level system software, such as the Linux kernel. We replace lists in the Linux kernel's directory cache with OLF lists, which currently requires a coarse-grained lock to ensure invariants across multiple lists. OLF lists in the Linux kernel improve performance of a filesystem metadata microbenchmark by 3x over unmodified Linux at 8 CPUs. The TxOS prototype demonstrates that a mature OS running on commodity hardware can provide system transactions at a reasonable performance cost. As a practical OS abstraction for application developers, system transactions facilitate writing correct application code in the presence of concurrency and system failures. The OLF algorithm demonstrates that application developers can have both the functionality of locks and the performance scalability of a lock-free linked list.
Orchestration and atomicity
(2013-08) Kitchin, David Wilson; Misra, Jayadev; Cook, William Randall
This dissertation presents the concurrent programming language Ora, an extension of the Orc orchestration language with the capability to execute transactions. A new formal definition of transactions is given, in terms of two complementary properties: atomicity and coatomicity. These properties are described in terms of a partial order of events, rather than as properties of a totally ordered program trace. Atomicity and coatomicity are ensured in Ora programs by a novel algorithm for multiversion concurrency control.
Verification of sequential and concurrent libraries
(2010-08) Deshmukh, Jyotirmoy Vinay; Emerson, E. Allen; Aziz, Adnan; Garg, Vijay K.; Chase, Craig M.; Khurshid, Sarfraz; Mok, Aloysius K.
The goal of this dissertation is to present new and improved techniques for fully automatic verification of sequential and concurrent software libraries. In most cases, automatic software verification is plagued by undecidability, while in many others it suffers from prohibitively high computational complexity. Model checking -- a highly successful technique used for verifying finite state hardware circuits against logical specifications -- has been less widely adapted for software, as software verification tends to involve reasoning about potentially infinite state-spaces. Two of the biggest culprits responsible for making software model checking hard are heap-allocated data structures and concurrency. In the first part of this dissertation, we study the problem of verifying shape properties of sequential data structure libraries. Such libraries are implemented as collections of methods that manipulate the underlying data structure. Examples of such methods include: methods to insert, delete, and update data values of nodes in linked lists, binary trees, and directed acyclic graphs; methods to reverse linked lists; and methods to rotate balanced trees. Well-written methods are accompanied by documentation that specifies the observational behavior of these methods in terms of pre/post-conditions. A pre-condition [phi] for a method M characterizes the state of a data structure before the method acts on it, and the post-condition [psi] characterizes the state of the data structure after the method has terminated. In a certain sense, we can view the method as a function that operates on an input data structure, producing an output data structure. Examples of such pre/post-conditions include shape properties such as acyclicity, sorted-ness, tree-ness, reachability of particular data values, and reachability of pointer values, and data structure-specific properties such as: "no red node has a red child'', and "there is no node with data value 'a' in the data structure''. Moreover, methods are often expected not to violate certain safety properties such as the absence of dangling pointers, absence of null pointer dereferences, and absence of memory leaks. We often assume such specifications as implicit, and say that a method is incorrect if it violates such specifications. We model data structures as directed graphs, and use the two terms interchangeably. Verifying correctness of methods operating on graphs is an instance of the parameterized verification problem: for every input graph that satisfies [phi], we wish to ensure that the corresponding output graph satisfies [psi]. Control structures such as loops and recursion allow an arbitrary method to simulate a Turing Machine. Hence, the parameterized verification problem for arbitrary methods is undecidable. One of the main contributions of this dissertation is in identifying mathematical conditions on a programming language fragment for which parameterized verification is not only decidable, but also efficient from a complexity perspective. The decidable fragment we consider can be broadly sub-divided into two categories: the class of iterative methods, or methods which use loops as a control flow construct to traverse a data structure, and the class of recursive methods, or methods that use recursion to traverse the data structure. We show that for an iterative method operating on a directed graph, if we are guaranteed that if the number of destructive updates that a method performs is bounded (by a constant, i.e., O(1)), and is guaranteed to terminate, then the correctness of the method can be checked in time polynomial in the size of the method and its specifications. Further, we provide a well-defined syntactic fragment for recursive methods operating on tree-like data structures, which assures that any method in this fragment can be verified in time polynomial in the size of the method and its specifications. Our approach draws on the theory of tree automata, and we show that parameterized correctness can be reduced to emptiness of finite-state, nondeterministic tree automata that operate on infinite trees. We then leverage efficient algorithms for checking the emptiness of such tree automata to obtain a tractable verification framework. Our prototype tool demonstrates the low theoretical complexity of our technique by efficiently verifying common methods that operate on data structures. In the second part of the dissertation, we tackle another obstacle for tractable software verification: concurrency. In particular, we explore application of a static analysis technique based on interprocedural dataflow analysis to predict and document deadlocks in concurrent libraries, and analyze deadlocks in clients that use such libraries. The kind of deadlocks that we focus result from circular dependencies in the acquisition of shared resources (such as locks). Well-written applications that use several locks implicitly assume a certain partial order in which locks are acquired by threads. A cycle in the lock acquisition order is an indicator of a possible deadlock within the application. Methods in object-oriented concurrent libraries often encapsulate internal synchronization details. As a result of information hiding, clients calling the library methods may cause thread safety violations by invoking methods in a manner that violates the partial ordering between lock acquisitions that is implicit within the library. Given a concurrent library, we present a technique for inferring interface contracts that speciy permissible concurrent method calls and patterns of aliasing among method arguments that guarantee deadlock-free execution for the methods in the library. The contracts also help client developers by documenting required assumptions about the library methods. Alternatively, the contracts can be statically enforced in the client code to detect potential deadlocks in the client. Our technique combines static analysis with a symbolic encoding for tracking lock dependencies, allowing us to synthesize contracts using a satisfiability modulo theories (SMT) solver. Additionally, we investigate extensions of our technique to reason about deadlocks in libraries that employ signalling primitives such as wait-notify for cooperative synchronization. We demonstrate its scalability and efficiency with a prototype tool that analyzed over a million lines of code for some widely-used open-source Java libraries in less than 50 minutes. Furthermore, the contracts inferred by our approach have been able to pinpoint real bugs, i.e. deadlocks that have been reported by users of these libraries.
Weak-memory local reasoning
(2012-12) Wehrman, Ian Anthony; Hunt, Warren A., 1958-; Moore, J Strother, 1947-; Berdine, Josh; Hoare, C. A. R.; Emerson, E. Allen; Fussell, Donald S
Program logics are formal logics designed to facilitate specification and correctness reasoning for software programs. Separation logic, a recent program logic for C-like programs, has found great success in automated verification due in large part to its embodiment of the principle of local reasoning, in which specifications and proofs are restricted to just those resources—variables, shared memory addresses, locks, etc.—used by the program during execution. Existing program logics make the strong assumption that all threads agree on the values of shared memory at all times. But, on modern computer architectures, this assumption is unsound for certain shared-memory concurrent programs: namely, those with races. Typically races are considered to be errors, but some programs, like lock-free concurrent data structures, are necessarily racy. Verification of these difficult programs must take into account the weaker models of memory provided by the architectures on which they execute. This dissertation project seeks to explicate a local reasoning principle for x86-like architectures. The principle is demonstrated with a new program logic for concurrent C-like programs that incorporates ideas from separation logic. The goal of the logic is to allow verification of racy programs like concurrent data structures for which no general-purpose high-level verification techniques exist.