Fair and high performance shared memory resource management
Chip multiprocessors (CMPs) commonly share a large portion of memory system resources among different cores. Since memory requests from different threads executing on different cores significantly interfere with one another in these shared resources, the design of the shared memory subsystem is crucial for achieving high performance and fairness.
Inter-thread memory system interference has different implications based on the type of workload running on a CMP. In multi-programmed workloads, different applications can experience significantly different slowdowns. If left uncontrolled, large disparities in slowdowns result in low system performance and make system software's priority-based thread scheduling policies ineffective. In a single multi-threaded application, memory system interference between threads of the same application can slow each thread down significantly. Most importantly, the critical path of execution can also be significantly slowed down, resulting in increased application execution time.
This dissertation proposes three mechanisms that address different shortcomings of current shared resource management techniques targeted at multi-programmed workloads, and one mechanism which speeds up a single multi-threaded application by managing main-memory related interference between its different threads.
With multi-programmed workloads, the key idea is that both demand- and prefetch-caused inter-application interference should be taken into account in shared resource management techniques across the entire shared memory system. Our evaluations demonstrate that doing so significantly improves both system performance and fairness compared to the state-of-the-art. When executing a single multi-threaded application on a CMP, the key idea is to take into account the inter-dependence of threads in memory scheduling decisions. Our evaluation shows that doing so significantly reduces the execution time of the multi-threaded application compared to using state-of-the-art memory schedulers designed for multi-programmed workloads.
This dissertation concludes that the performance and fairness of CMPs can be significantly improved by better management of inter-thread interference in the shared memory resources, both for multi-programmed workloads and multi-threaded applications.