Sparsity prior in efficient deep learning based solvers and models

dc.contributor.advisorWang, Zhangyang
dc.contributor.committeeMemberMarculescu, Radu
dc.contributor.committeeMemberVikalo, Haris
dc.contributor.committeeMemberDimakis, Alexandros G.
dc.contributor.committeeMemberYin, Wotao
dc.creatorChen, Xiaohan
dc.creator.orcid0000-0002-0360-0402
dc.date.accessioned2022-11-30T00:14:44Z
dc.date.available2022-11-30T00:14:44Z
dc.date.created2022-08
dc.date.issued2022-09-14
dc.date.submittedAugust 2022
dc.date.updated2022-11-30T00:14:45Z
dc.description.abstractDeep learning has been empirically successful in recent years thanks to the extremely over-parameterized deep models and the data-driven learning with enormous amounts of data. However, deep learning models are especially limited in terms of efficiency, which has two-fold meanings. Firstly, many deep models are designed in a black-box manner, which means these black-box models are unaware of the prior knowledge about the structure of the problems of interest and hence cannot efficiently utilize it. Such unawareness can cause redundancy in parameterization and inferior performance compared to more dedicated methods. Secondly, the extreme over-parameterization itself is inefficient in terms of model storage, memory requirements and computational complexity. This strictly constrains the realistic applications of deep learning on mobile devices with budget resources. Moreover, the financial and environmental costs of training such enormous deep models are unreasonably high, which is exactly the opposite of the call of green AI. In this work, we strive to address the inefficiency of deep models by introducing sparsity as an important prior knowledge to deep learning. Our efforts will be in three sub-directions. In the first direction, we aim at accelerating the solving process for a specific type of optimization problems with sparsity constraints. Instead of designing black-box deep learning models, we derive new parameterizations by absorbing insights from the sparse optimization field, which result in compact deep-learning-based solvers with significantly reduced training costs but superior empirical performance. In the second direction, we introduce sparsity to deep neural networks via weight pruning. Pruning reduces redundancy in over-parameterized deep networks by removing superfluous weights, thus naturally compressing the model storage and computational costs. We aim at pushing pruning to the limit by combining it with other compression techniques for extremely efficient deep models that can be deployed and fine-tuned on edge devices. In the third direction, we investigate what sparsity brings to deep networks. Creating sparsity in deep networks significantly changes the landscape of its loss function and thus behaviors during training. We aim at understanding what these changes are and how we can utilize them to train better sparse neural networks. The main content of this work can be summarized as below. Sparsity Prior in Efficient Deep Solvers. We adopt the algorithm unrolling method to transform classic optimization algorithms into feed-forward deep neural networks that can accelerate convergence by over 100x times. We also provide theoretical guarantees of linear convergence over the newly developed solvers, which is faster than the convergence rate achievable with classic optimization. Meanwhile, the number of parameters to be trained is reduced from millions to tens and even to 3 hyperparameters, decreasing the training time from hours to 6 minutes. Sparsity Prior in Efficient Deep Learning. We investigate compressing deep networks by unifying pruning, quantization and matrix factorization techniques to remove as much redundancy as possible, so that the resulting networks have low inference and/or training costs. The developed methods improve memory/storage efficiency and latency by at least 5x times, varying over data sets and models used. Sparsity Prior in Sparse Neural Networks. We discuss the properties and behaviors of sparse deep networks with the tool of lottery ticket hypothesis (LTH) and dynamic sparse training (DST) and explore their application for efficient training in computer vision, natural language processing and Internet-of-Things (IoT) systems. With our developed sparse neural networks, performance loss is significantly mitigated while by training much fewer parameters, bringing benefits of saving computation costs in general and communication costs specifically for IoT systems.
dc.description.departmentElectrical and Computer Engineering
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://hdl.handle.net/2152/116816
dc.identifier.urihttp://dx.doi.org/10.26153/tsw/43711
dc.language.isoen
dc.subjectSparsity
dc.subjectDeep learning
dc.subjectLearning to optimize
dc.subjectSparse neural network
dc.subjectEfficient deep learning
dc.titleSparsity prior in efficient deep learning based solvers and models
dc.typeThesis
dc.type.materialtext
thesis.degree.departmentElectrical and Computer Engineering
thesis.degree.disciplineElectrical and Computer Engineering
thesis.degree.grantorThe University of Texas at Austin
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy

Access full-text files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
CHEN-DISSERTATION-2022.pdf
Size:
6.84 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
4.45 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.84 KB
Format:
Plain Text
Description: