Hardware implementation of inference in deep neural networks
Access full-text files
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Deep learning neural network algorithms, including convolutional and recurrent networks, have risen to popularity in recent years. Along with this popularity has come a wide range of implementations that optimize the performance of these algorithms on existing hardware, including GPU architectures and with modern x86 CPU SIMD capabilities. Likewise, effort has been put into developing hardware specifically for running these algorithms, either focusing on specific algorithms or on a range of building block operations common to many deep learning variations. While some of these architectures, have large power requirements and are generally designed to run in a datacenter environment, hardware architectures that are designed to run most deep learning well while being small, low cost and/or power are also important for applications where these are limiting factors. In this work I will describe the implementation of both convolutional and recurrent network layer types on such a novel hardware architecture. This novel ultra-wide SIMD architecture is built around a ring of simple data movement and register units that feed simple arithmetic units, attached accumulator registers and post-processing units. Unlike many other architecture designs however, this class of hardware designs posses few methods for efficiently rearranging data over even moderate distances in memory but rather relies on shifting data between adjacent or nearby data units in the ring. Thus, neural network implementations that take the geometry of the inputs into account as much as possible are needed. I present one such implementation, M³inM²V, and show that it allows such simple hardware architectures to be efficiently used for neural network inference, analyzing both it's performance on the described novel architecture and the very different AVX-512 SIMD architecture. Furthermore, I show the applicability of recurrent network architectures to a novel domain; the decoding of information encoded in the electrical spiking activity observed from ensembles of neurons. By comparing the ability of a classifier to infer different pieces of information from the data and/or comparing classifiers trained using different methods of transforming the observed activity into feature vectors inferences can be made about what information is encoded in the neural signal, and how. By showing that deep learning classifiers can perform useful classification on such a dataset, possibly with less parameter tuning than other classifiers, I show that such tools can contribute to increasing scientific understanding of the brain. Furthermore, for future applications for decoding neural signals such as the control of prosthetic devices, the ability to run the decoding algorithms on relatively low power hardware would be highly advantageous.