Performance and complexity tradeoffs in partially-inclusive caches
Multi-level inclusive cache hierarchies have historically provided a convenient tradeoff between performance and design complexity. However, as the desire for more intermediate levels of caches rises, the shrinking size disparity between adjacent levels of cache exacerbates the wasteful redundancy inherent in inclusive cache designs. Where it is still beneficial to have larger, slower caches act as inclusive caches and snoop filters for smaller, faster caches nearer to the core, those benefits can be undermined by excessive data duplication and frequent back-invalidations when the larger cache is only a factor of two- to four-times the size of the smaller cache. One technique to address the issues that arise with inclusive caches is partial inclusivity. Partially inclusive caches can help address the problem of data duplication in a cache hierarchy, while still providing performance and robust snoop filtering akin to that of a traditional inclusive cache. Moreover, such cache designs can decrease the frequency of back-invalidates caused by strictly inclusive caches. We describe two approaches to implementing a partially inclusive mid-level cache, while exploring the implications of our design decisions on performance, array size, and implementation complexity. We show that the first approach, ThinL2, allows for simpler coherence record-keeping but dramatically increases snooping of the first-level caches. We also show that the second approach, WideL2, allows for relatively efficient snooping of the first-level caches but incurs much more record-keeping complexity. We then provide ideas for addressing some of the complexity problems associated with WideL2.