Networking abstractions for GPU programs

Kim, Sangman

Networking abstractions for GPU programs

Access full-text files

KIM-DISSERTATION-2015.pdf (1.12 MB)

Date

2015-08

Authors

Kim, Sangman

Abstract

Graphics Processing Units (GPUs) are becoming major general-purpose computing hardware for high-performance parallel computation. Despite their general computation capability and impressive performance, GPUs still lack important operating system (OS) services like networking, which makes building networking services or distributed systems on GPUs challenging. This thesis presents GPUnet, a native GPU networking layer that provides a socket abstraction and high-level networking APIs for GPU programs. GPUnet abstracts complicated coordination of processors and network interfaces from GPU applications, and streamlines the development of server applications on GPUs. We develop several applications that harness the benefit of GPUnet: our matrix multiplication server with GPUnet's performance matches or surpasses the performance of the server without GPUnet, with only 24-43% of lines of code. We also show the scalability of in-GPU-memory MapReduce (GimMR) applications across multiple GPUs. Its word count and K-means workloads can scale to four GPUs with speedups of 2.9-3.5x over one GPU. GPUnet addresses three key challenges: massive parallelism of GPU programs, memory copy overhead between CPU memory and GPU memory, and slow single-threaded performance of GPUs. To better support massively parallel GPU programs, the networking API invocations from multiple threads at the same point in a data-parallel code are coalesced. Direct communication between GPUs and the network devices reduces the copy overhead, and, to minimize the amount of time spent in the single-threaded operation, the control-intensive networking operations are offloaded to the network device.