Demystifying deep network architectures : from theory to applications




Chen, Wuyang

Journal Title

Journal ISSN

Volume Title



Deep neural networks significantly power the success of machine learning and artificial intelligence. Over the past decade, the community keeps designing architectures of deep layers and complicated connections. Many works in deep learning theory tried to understand deep networks from different perspectives and contributed to concrete analysis. However, the gap between deep learning theory and application is growingly large. Specifically, due to our partial understanding of our deep networks, current deep learning theory is not enough to guide us in designing practical neural architectures. This is mainly because of two gaps: 1) designing network architectures is very expensive, and 2) practical network architectures are much more complicated than what we studied in theory. Therefore, our core question to focus on is: how do we bridge these gaps, between deep learning theory and practical neural architecture designs? This dissertation will center around this challenge and tries to bridge the gap between the two worlds. First, current deep learning theory can inspire our architecture design (Chapter 3 and 4). We propose three theory-inspired indicators with strong correlations with networks' performance and they can be measured at networks' initialization without introducing any gradient descent cost. Based on these important metrics, we propose a training-free neural architecture design algorithm with extremely low computation and time costs. Second, the architecture design can further inspire deep learning theory (Chapter 5 and 6). By introducing two principled directions of the network's graph topology, we jointly analyze the impact of the network architecture on its convergence, expressivity, and generalization of networks, and demonstrate a "no free lunch" behavior in ReLU networks. Finally, a practical use case in the industry will be discussed (Chapter 7), where we design and scale up vision foundation models, again, without any training cost.


LCSH Subject Headings