Adaptive and weighted optimization for efficient and robust learning

dc.contributor.advisorWard, Rachel, 1983-
dc.contributor.committeeMemberBiros, George
dc.contributor.committeeMemberBui, Tan
dc.contributor.committeeMemberKileel, Joseph
dc.contributor.committeeMemberSarkar, Purnamrita
dc.creatorXie, Yuege
dc.date.accessioned2022-10-27T02:18:52Z
dc.date.available2022-10-27T02:18:52Z
dc.date.created2022-08
dc.date.issued2022-08-12
dc.date.submittedAugust 2022
dc.date.updated2022-10-27T02:18:53Z
dc.description.abstractModern machine learning has made significant breakthroughs for scientific and technological applications and led to paradigm shifts in optimization and generalization theories. Adaptive and weighted optimization have become the workhorses behind today's machine learning applications, but there is still much to learn about why they work in practice and how we can further improve their efficiency and robustness. In this thesis, we first establish the linear convergence of adaptive optimization and then analyze the generalization error of weighted optimization. With these theoretical results, we develop efficient and robust learning algorithms to tackle real-world problems such as model sparsification, image classification, and medical image segmentation. To establish linear convergence guarantees for AdaGrad-Norm, an adaptive gradient descent algorithm, we develop a two-stage analysis framework and show that the convergence is robust to the initial learning rate. Unlike prior work, our analysis does not require knowledge of smoothness parameters or strong convexity parameters. To understand the generalization of weighted trigonometric interpolation, we derive exact expressions of the generalization error of both plain and weighted least squares estimators. Then we show how a bias towards smooth interpolants can lead to smaller generalization errors in the overparameterized regime. For efficient sparse model learning, we propose SHRIMP (Sparser Random feature model via Iterative Magnitude Pruning) to adaptively fit high-dimensional data with inherent low-dimensional structure. SHRIMP performs better than other sparse feature models under lower computational complexity while enabling feature selection and being robust to pruning rates. To further improve the computational efficiency and robustness of AdaGrad-Norm, we propose AdaLoss, an adaptive learning rate schedule that uses only the loss function instead of computing gradient norms. On top of AdaLoss, we enhance data augmentation consistency regularization with an adaptively weighted schedule (\ours) using loss information to handle volumetric medical image segmentation with both sparsely labeled and densely labeled slices. We evaluate our method on CT and MRI scans and demonstrate superior performance over several baselines.
dc.description.departmentComputational Science, Engineering, and Mathematics
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://hdl.handle.net/2152/116394
dc.identifier.urihttp://dx.doi.org/10.26153/tsw/43289
dc.language.isoen
dc.subjectAdaptive optimization
dc.subjectWeighted optimization
dc.subjectOverparameterization
dc.subjectSparse random features
dc.titleAdaptive and weighted optimization for efficient and robust learning
dc.typeThesis
dc.type.materialtext
thesis.degree.departmentComputational Science, Engineering, and Mathematics
thesis.degree.disciplineComputational Science, Engineering, and Mathematics
thesis.degree.grantorThe University of Texas at Austin
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy

Access full-text files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
XIE-DISSERTATION-2022.pdf
Size:
22.38 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
4.45 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.84 KB
Format:
Plain Text
Description: