Evaluating headroom for smart caching policies on GPUs
This report evaluates two distinct methods of improving the performance of GPU memory systems. Over the past semester, our research has focused on applying a state-of-the-art CPU cache replacement policy on GPUs and exploring headroom of preemptively writing back dirty cache lines. Our first goal is to reduce L1 and L2 cache miss rates on GPU by implementing the Hawkeye cache replacement policy. Hawkeye calculates the optimal cache replacement policy on previous cache accesses in order to train its predictor for future caching decisions. While some benchmarks show performance improvements with Hawkeye, a significant amount of our benchmarks are not sensitive to the performance of the cache. From our experiments, we show that Hawkeye, on average, gives an IPC improvement of 3.57% and 0.56% over Least Recently Used (LRU) when applied to the L1 and L2 caches respectively. We also introduce the idea of precleaning, an alternative to write-back or write-through caching that aims to spread out write bandwidth. Committing L2 writes to main memory when memory congestion is low can hide or lower the performance impact of said write. The idea of precleaning shows promise, but evaluating precleaning fully requires more research in GPU access patterns and prediction techniques.