Parallelization of ordered irregular algorithms
Access full-text files
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Parallel computing hardware is ubiquitous, ranging from cell-phones with multiple cores to super-computers with hundreds of thousands of cores. Applications need to take advantage of this parallelism to reduce the time required for their execution as well as to scale to larger input sizes. This has proved to be very challenging because programmers are used to thinking serially, and because current programming notations for writing parallel software are very low level and hard to use in general. At the same time, new problem domains like social network analysis and data analytics are giving rise to "irregular'" applications that have far more complicated patterns of parallelism and locality than applications in more traditional parallel computing fields like computational science. This dissertation is focused on ordered irregular applications, which are applications in which computations must appear to have been performed in an application-specific order. One example considered in this dissertation is the simulation of colliding balls on a billiards table. Collisions in different parts of the table may be performed in parallel as long as in the end, all collisions appear to have been performed in time order. Ordered algorithms are more complex to parallelize than unordered algorithms, such as some mesh refinement algorithms, which lack a notion of algorithmic order. Existing languages and runtime systems are limited in their ability to express and parallelize ordered applications. To address these problems, we introduce a new abstraction for understanding and exploiting parallelism in ordered applications, called the Kinetic Dependence Graph (KDG). We have implemented a high-level user interface for programmers that permits them to write ordered algorithms without needing to understand the Kinetic Dependence Graph or its implementation details. This interface permits programmers to write sequential C++ programs with implicitly parallel loop constructs, using our parallel data structures and runtime libraries. We have also implemented several runtimes that use the KDG to run these applications in parallel. One runtime uses the KDG directly to find and exploit parallelism. The other runtime is based on optimistic (speculative) execution, and it uses the KDG to improve the efficiency of speculation. Our results show a speedup of up to 33 on a 40 core Intel server, performing better than or at par with third party implementations, most of which have been implemented and optimized by hand using low level tools and intricate knowledge of algorithms and hardware.