Bottlenecks in big data analytics and AI applications and opportunities for improvement
From shopping to social interaction, the related domains of big data analytics and artificial intelligence (AI) applications affect many aspects of our daily activities. Their success arises in part from their highly parallelizable compute, which allows them to process massive data sets in data centers, serve large numbers of users simultaneously, and perform almost innumerable simple calculations very quickly. Despite the success and ubiquity of big data analytics and AI, I show that the foundational principle of high performance in these paradigms—abundant and easily exploited parallel computation—has been pushed to the point where the limitations of parallel computing have come to dictate application performance. Using the industry benchmark TPCx-BB, I demonstrate that most of the compute is spent in code regions unable to fully utilize the available cores. In accordance with Amdahl’s law, overall performance is dictated by these less-parallel regions of compute. And in a data center deployment of an end-to-end AI application, the abundant parallelism of DNN inference is overshadowed by the non-parallel portions of the application pipeline: pre- and post-processing and inter-server communication. In a study of accelerated AI, I show that at a modest 8x compute speedup, performance improvement is completely halted by the limited storage bandwidth of just a handful of servers. Even within DNN inference itself, the demand for higher performance is pushing current hardware to its limits, to the point where DNN accuracy must sometimes be sacrificed for latency.
To address the limitations at the boundaries of parallel computing in these domains, I propose solutions targeted to each domain. In big data analytics, I demonstrate that restricting big data software to a small subset of the available cores on each server can substantially improve performance and I propose a combined hardware/software solution called core packing that would extend these benefits (up to 20% latency reduction) to a wide range of big data applications. In data center AI applications, I demonstrate how an edge data center, carefully tailored to the specific behavior of accelerated AI applications, can accommodate up to 32x accelerated AI at 15% lower total cost of ownership than a comparable data center that does not tailor itself to the needs of the application. And within DNN inference, I show that an additional source of parallelism—between adjacent layers in the DNN graph—can be exploited to offer latency reductions up to 39%.