Scale-CNN : a tool for generating scalable high-throughput CNN inference accelerators on FPGAs

Access full-text files




Rauch, Daniel Levi

Journal Title

Journal ISSN

Volume Title



In the past decade, research has shown that CNN inference can be considerably sped up via dedicated hardware accelerators. However, most existing accelerators have limited performance by only working on a single inference at a time and/or relying on slow off-chip memory accesses for hidden layers. These limitations stem from the high memory requirements of CNN inference, which can be 10s of Mb even for small networks with reduction techniques. Despite this, as Moore's law has continued to scale, this level of on-chip memory is now attainable. We propose Scale-CNN, a tool for generating multiple Pareto optimal design points for high-throughput CNN inference accelerators on Xilinx Ultrascale+ FPGAs. The Scale-CNN architecture dedicates separate hardware resources for each layer and stores all feature maps and weights on-chip, enabling a high-throughput network pipeline where each layer works on a different inference simultaneously with no off-chip memory accesses. Using Scale-CNN, we generate several accelerator IPs for Tiny Darknet on the smallest Virtex Ultrascale+ FPGA (XCVU3P) that range from 1.7 to 56.7 inferences per second utilizing 22% to 66% of FPGA resources.


LCSH Subject Headings