Analyzing and improving MAESTRO's analytical modeling
Deep learning accelerators efficiently execute deep learning applications through customization. However, designing specialized hardware takes considerable time and engineering effort. Design space exploration (DSE) tools automate the design of specialized accelerators by automatically evaluating designs in this vast design space. A core component of DSE tools is an analytical model, which allows the DSE tool to filter out invalid or sub-optimal candidates at a coarse granularity before resorting to synthesis, which is more accurate but time-consuming. The MAESTRO analytical model has been used in existing DSE tools because it strikes a good balance between detail and speed.
In this thesis, we improve the MAESTRO analytical model by identifying and fixing three limitations, namely (1) we consider buffer sizes in the memory energy and area model, (2) we add support for differentiating between a unified buffer and partitioned buffer, and (3) we add support for exploring bit-precision.
Next, we detail future directions for improving the MAESTRO analytical model. First, we do a component-wise area and power analysis on a commercial accelerator, Nvidia Deep Learning Accelerator (NVDLA) , to gain insights about sub-components not modeled by the analytical model. Next, to understand the impact of compute organization, we do a component-wise area analysis of two compute organizations executing matrix-vector multiplication.