IC design for reliability
As the feature size of integrated circuits goes down to the nanometer scale, transient and permanent reliability issues are becoming a significant concern for circuit designers. Traditionally, the reliability issues were mostly handled at the device level as a device engineering problem. However, the increasing severity of reliability challenges and higher error rates due to transient upsets favor higher-level design for reliability (DFR). In this work, we develop several methods for DFR at the circuit level. A major source of transient errors is the single event upset (SEU). SEUs are caused by high-energy particles present in the cosmic rays or emitted by radioactive contaminants in the chip packaging materials. When these particles hit a N+/P+ depletion region of an MOS transistor, they may generate a temporary logic fault. Depending on where the MOS transistor is located and what state the circuit is at, an SEU may result in a circuit-level error. We analyze SEUs both in combinational logic and memories (SRAM). For combinational logic circuit, we propose FASER, a Fast Analysis tool of Soft ERror susceptibility for cell-based designs. The efficiency of FASER is achieved through its static and vector-less nature. In order to evaluate the impact of SEU on SRAM, a theory for estimating dynamic noise margins is developed analytically. The results allow predicting the transient error susceptibility of an SRAM cell using a closedform expression. Among the many permanent failure mechanisms that include time-dependent oxide breakdown (TDDB), electro-migration (EM), hot carrier effect (HCE), and negative bias temperature instability (NBTI), NBTI has recently become important. Therefore, the main focus of our work is NBTI. NBTI occurs when the gate of PMOS is negatively biased. The voltage stress across the gate generates interface traps, which degrade the threshold voltage of PMOS. The degraded PMOS may eventually fail to meet timing requirement and cause functional errors. NBTI becomes severe at elevated temperatures. In this dissertation, we propose a NBTI degradation model that takes into account the temperature variation on the chip and gives the accurate estimation of the degraded threshold voltage. In order to account for the degradation of devices, traditional design methods add guard-bands to ensure that the circuit will function properly during its lifetime. However, the worst-case based guard-bands lead to significant penalty in performance. In this dissertation, we propose an effective macromodel-based reliability tracking and management framework, based on a hybrid network of on-chip sensors, consisting of temperature sensors and ring oscillators. The model is concerned specifically with NBTIinduced transistor aging. The key feature of our work, in contrast to the traditional tracking techniques that rely solely on direct measurement of the increase of threshold voltage or circuit delay, is an explicit macromodel which maps operating temperature to circuit degradation (the increase of circuit delay). The macromodel allows for costeffective tracking of reliability using temperature sensors and is also essential for enabling the control loop of the reliability management system. The developed methods improve the over-conservatism of the device-level, worstcase reliability estimation techniques. As the severity of reliability challenges continue to grow with technology scaling, it will become more important for circuit designers/CAD tools to be equipped with the developed methods.