Robust machine learning against unseen distributional shifts
Access full-text files
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The performance decay experienced by deep neural networks (DNNs) when confronted with distributional shifts poses a significant challenge for their deployment in real-world scenarios. The limited out-of-distribution (OOD) generalization ability of DNNs acts as a critical bottleneck, especially considering the vast diversity and unpredictable changes inherent in the real-world test data distribution. To tackle this challenge, robust training emerges as a highly effective strategy for enhancing the OOD generalization ability of DNNs, enabling them to excel in complex and unpredictable real-world environments. This dissertation revolves around my extensive research on robust training, encompassing three fundamental aspects: robust data augmentation, robust model architecture design, and robust learning algorithms. Within the realm of robust data augmentation, my research delves beyond traditional augmentation methods. I explore innovative techniques that jointly generates diverse and hard training samples. Such data augmentation strategy enhances the robustness of DNNs, equipping them with the ability to generalize effectively even in the face of unforeseen changes in the data distribution. In terms of robust model architecture design, I investigate approaches that empower DNNs with inherent resilience to distributional shifts. Specifically, I design more robust normalization layers for deep neural networks, and utilize model ensembles to increase the models' capability to encode diverse training samples and generalize to unseen test cases. Moreover, I investigate advanced training algorithms that explicitly emphasize robustness. These algorithms involve leveraging adversarial training paradigms to expose and mitigate vulnerabilities, developing novel loss functions such as contrastive loss that prioritize performance on challenging or out-of-distribution samples, and robust regularization terms to prevent the models from overfitting to non-robust features. Most importantly, I show these robust learning strategies are not mutually independent with each other. Instead, it is crucial to adapt them with each other to achieve the best performance gain.