Exploiting long-term behavior for improved memory system performance
MetadataShow full item record
Memory latency is a key bottleneck for many programs. Caching and prefetching are two popular hardware mechanisms to alleviate the impact of long memory latencies, but despite decades of research, significant headroom remains. In this thesis, we show how we can significantly improve caching and prefetching by exploiting a long history of the program's behavior. Towards this end, we define new learning goals that fully exploit long-term information, and we propose history representations that make it feasible to track and manipulate long histories. Based on these insights, we advance the state-of-the-art for three important memory system optimizations. For cache replacement, where existing solutions have relied on simplistic heuristics, our solution pursues the new goal of learning from the optimal solution for past references to predict caching decisions for future references. For irregular prefetching, where previous solutions are limited in scope due to their inefficient management of long histories, our goal is to realize the previously unattainable combination of two popular learning techniques, namely address correlation and PC-localization. Finally, for regular prefetching, where recent solutions learn increasingly complex patterns, we leverage long histories to simplify the learning goal and to produce more timely and accurate prefetches. Our results are significant. For cache replacement, our solution reduces misses for memory-intensive SPEC 2006 benchmarks by 17.4% compared to 11.4% for the previous best. For irregular prefetching, our prefetcher obtains 23.1% speedup (vs. 14.1% for the previous best) with 93.7% accuracy, and it comes close to the performance of an idealized prefetcher with no resource constraints. Finally, for regular prefetching, our prefetcher improves performance by 102.3% over a baseline with no prefetching compared to the 90% speedup for the previous state-of-the-art prefetcher; our solution also incurs 10% less traffic than the previous best regular prefetcher.