Demystifying deep network architectures : from theory to applications

dc.contributor.advisorWang, Zhangyang
dc.contributor.committeeMemberBovik, Alan
dc.contributor.committeeMemberHanin, Boris
dc.contributor.committeeMemberKim, Hyeji
dc.contributor.committeeMemberYu, Zhiding
dc.creatorChen, Wuyang
dc.date.accessioned2023-07-18T00:57:31Z
dc.date.available2023-07-18T00:57:31Z
dc.date.created2023-05
dc.date.issued2023-04-14
dc.date.submittedMay 2023
dc.date.updated2023-07-18T00:57:32Z
dc.description.abstractDeep neural networks significantly power the success of machine learning and artificial intelligence. Over the past decade, the community keeps designing architectures of deep layers and complicated connections. Many works in deep learning theory tried to understand deep networks from different perspectives and contributed to concrete analysis. However, the gap between deep learning theory and application is growingly large. Specifically, due to our partial understanding of our deep networks, current deep learning theory is not enough to guide us in designing practical neural architectures. This is mainly because of two gaps: 1) designing network architectures is very expensive, and 2) practical network architectures are much more complicated than what we studied in theory. Therefore, our core question to focus on is: how do we bridge these gaps, between deep learning theory and practical neural architecture designs? This dissertation will center around this challenge and tries to bridge the gap between the two worlds. First, current deep learning theory can inspire our architecture design (Chapter 3 and 4). We propose three theory-inspired indicators with strong correlations with networks' performance and they can be measured at networks' initialization without introducing any gradient descent cost. Based on these important metrics, we propose a training-free neural architecture design algorithm with extremely low computation and time costs. Second, the architecture design can further inspire deep learning theory (Chapter 5 and 6). By introducing two principled directions of the network's graph topology, we jointly analyze the impact of the network architecture on its convergence, expressivity, and generalization of networks, and demonstrate a "no free lunch" behavior in ReLU networks. Finally, a practical use case in the industry will be discussed (Chapter 7), where we design and scale up vision foundation models, again, without any training cost.
dc.description.departmentElectrical and Computer Engineering
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://hdl.handle.net/2152/120500
dc.identifier.urihttp://dx.doi.org/10.26153/tsw/47362
dc.language.isoen
dc.subjectDeep learning
dc.subjectDeep network architecture
dc.subjectNeural architecture search
dc.subjectVision transformer
dc.subjectNeural tangent kernel
dc.subjectLinear region
dc.subjectJacobian
dc.subjectGeneralization
dc.titleDemystifying deep network architectures : from theory to applications
dc.typeThesis
dc.type.materialtext
thesis.degree.departmentElectrical and Computer Engineering
thesis.degree.disciplineElectrical and Computer Engineering
thesis.degree.grantorThe University of Texas at Austin
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy

Access full-text files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
CHEN-DISSERTATION-2023.pdf
Size:
9.2 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
4.45 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.84 KB
Format:
Plain Text
Description: