Scalable primary cache memory architectures
For the past decade, microprocessors have been improving in overall performance at a rate of approximately 50–60% per year by exploiting a rapid increase in clock rate and improving instruction throughput. A part of this trend has included the growth of on-chip caches which in modern processor can be as large as 2MB. However, as smaller technologies become prevalent, achieving low average memory access time by simply scaling existing designs becomes more difficult because of process limitations. This research shows that scaling an existing design by either keeping the latency of various structures constant or allowing the latency to vary while keeping the capacity constant leads to degradation in the instructions per cycle (IPC). The goal of this research is to improve IPC at small feature sizes, using a combination of circuit and architectural techniques. This research develops technology-based models to estimate cache access times and uses the models for architectural performance estimation. The performance of a microarchitecture with clustered functional units coupled with a partitioned primary data cache is estimated using the cache access time models. This research evaluates both static and dynamic data mapping on the partitioned primary data cache and shows that using dynamic mapping in combination with the partitioned cache outperforms both the unified cache as well as the statically mapped design. In conjunction with the dynamic data mapping, this research proposes and evaluates predictive instruction steering strategies that help improve the performance of clustered processor designs. This research shows that a hybrid predictive instruction steering policy coupled with an aggressive dynamic mapping of data in a partitioned primary data cache can significantly improve the instructions per cycle (IPC) of a clustered processor, relative to dependence based steering with a unified data cache.