Cache design for low power and yield enhancement

Access full-text files

Date

2008-08

Authors

Mohammad, Baker Shehadah

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

One of the major limiters to computer systems and systems on chip (SOC) designs is accessing the main memory, which is typically two orders of magnitude slower than the processor. To bridge this gap, modern processors already devote more than half of the on-chip transistors to the last-level cache. Caches have negative impact on area, power, and yield. This research goal is to design caches that operate at lower voltages while enhancing yield. Our strategy is to improve the static noise margin (SNM) and the writability of the conventional six-transistor SRAM cell by reducing the effect of parametric variations on the cell. This is done using a novel circuit that reduces the voltage swing on the word line during read operations and reduces the memory supply voltage during write operations. The proposed circuit increases the SRAM’s SNM and write margin using a single voltage supply that has minimal impacts on chip area, complexity, and timing. A test chip with an 8-kilobyte SRAM block manufactured in 45- nm technology is used to verify the practicality of the contribution and demonstrate the effectiveness of the new circuit’s implementation. Cache organization is one of the most important factors that affect cache design complexity, performance, area, and power. The main architectural choice for caches is whether to implement the tag array using a standard SRAM or using a content addressable memory (CAM). The choice made has far-reaching consequences on several aspects of the cache design, and in particular on power consumption. Our contribution in this area is an in-depth study of the complex tradeoffs of area, timing, power, and design complexity between an SRAM-based tag and a CAM-based one. Our results indicate that an SRAM-based tag design often provides a better overall design point and is superior with respect to energy, especially for interleaved multi-threading processors. Being able to test and screen chips is a key factor in achieving high yield. Most industry standard CAD tools used to analyze fault coverage and generate test vectors require gate level models. However, since caches are typically designed using a transistor-level flow, there is a need for an abstraction step to generate the gate models, which must be equivalent to the actual design (transistor level). The third contribution of the research is a framework to verify that the gate level representation of custom designs is equivalent to the transistor-level design.

Description

text

Keywords

Citation