The Parla runtime




Stephens, Sean Edward

Journal Title

Journal ISSN

Volume Title



Writing high-performance programs to target heterogeneous compute nodes poses many challenges associated with properly managing data-level and task-level parallelism across various processing units. Parla is a heterogeneous task-based programming framework which simplifies writing portable multi-device code by enabling programmers to leverage task-level parallelism with simple decorator annotations while fully utilizing Python’s rich scientific programming stack.

The underlying runtime system of Parla must support the efficient execution of a variety of task graphs on complex heterogeneous nodes. This runtime is divided into three phases: mapper, scheduler, and launcher. I present the design of each phase and discuss the motivation behind design decisions, with particular attention to the performant treatment of GPU tasks. I show that the current runtime’s heuristic-based mapping policies run similarly well to optimal user-specified mappings on a variety of workloads. Lastly, I detail many areas of future work to further improve the runtime performance.


LCSH Subject Headings