Browsing by Subject "Self-supervised learning"
Now showing 1 - 5 of 5
- Results Per Page
- Sort Options
Item Additive logistic mechanism for privacy-preserving self-supervised learning(2022-05) Yang, Yunhao; Topcu, UfukSelf-supervised learning algorithms are vulnerable to privacy attacks, especially in the stage in which a pre-trained model is fine-tuned for the actual task. We focus on self-supervised learning applied to neural networks and design a post-training privacy-protection algorithm for the neural networks. We introduce a differential privacy mechanism, named additive logistic mechanism, which adds noise sampled from a logistic distribution to the fine-tuned layer-weights of the networks. A unique feature of the protection algorithm is allowing for post-training adjustments on the privacy parameters and alleviating the need for retraining. We apply membership inference attacks on both unprotected and protected models to quantify the trade-off between the model’s privacy and performance. We prove that the post-training protection algorithm is differentially private and empirically show that this protection can achieve a low differential privacy loss of epsilon < 1 while maintaining a performance loss of less than 5%.Item Aligning robot navigation behaviors with human intentions and preferences(2024-05) Karnan, Haresh ; Stone, Peter, 1971- October 14-; Deshpande, Ashish D.; Farshid Alambeigi; Junmin Wang; Garrett Warnell; Anca Dragan; Joydeep BiswasRecent advances in the field of machine learning have led to new ways for mobile robots to acquire advanced navigational capabilities (Bojarski et al., 2016; Kahn et al., 2018; Kendall et al., 2019; Pan et al., 2018; Silver et al., 2010). However, these learning-based methods raise the possibility that learned navigation behaviors may not align with the intentions and preferences of people, also known as value misalignment. To mitigate this danger, this dissertation aims to answer the question "How can we use machine learning methods to align the navigational behaviors of autonomous mobile robots with human intentions and preferences?" First, this dissertation answers this question by introducing a new approach to learning navigation behaviors by imitating human-provided demonstrations of the intended navigation task. This contribution allows mobile robots to acquire autonomous visual navigation capabilities through imitating human demonstrations using a novel objective function that encourages the agent to align with the navigation objective of the human and penalizes for misalignment. Second, this dissertation introduces two algorithms to enhance terrain-aware off-road navigation for mobile robots through learning visual terrain awareness in a self-supervised manner. This contribution enables mobile robots to obey a human operator’s preference for navigating over different terrains in urban outdoor environments and extrapolate these preferences to visually novel terrains by leveraging multi-modal representations. Finally, in the context of robot navigation in human-occupied environments, this dissertation¹ introduces a dataset and an algorithm for robot navigation in a socially compliant manner in both indoor and outdoor environments. In summary, the contributions in this dissertation take a significant step towards addressing the value alignment problem in autonomous navigation, enabling mobile robots to navigate autonomously with objectives that align with the intentions and preferences of humans.Item Learning variable frame rate and unsupervised video quality assessment(2022-08-14) Madhusudanarao, Pavan C; Bovik, Alan C. (Alan Conrad), 1958-; Birkbeck, Neil; Ghosh, Joydeep; Caramanis, Constantine; Geisler, WilsonHigh frame rate (HFR) videos are becoming increasingly common with the tremendous popularity of live, high-action streaming content such as sports. To optimize trade-offs between bandwidth requirements and video quality, in terms of frame rate adaptation, it is imperative to understand the intricate relationship between frame rate and perceptual video quality. Towards advancing progression in this direction we designed a new subjective resource, called the LIVE-YouTube-HFR (LIVE-YT-HFR) dataset, which is comprised of 480 videos having 6 different frame rates, obtained from 16 diverse contents. In order to understand the combined effects of compression and frame rate adjustment, we also processed videos at 5 compression levels at each frame rate. To obtain subjective labels on the videos, we conducted a human study yielding 19,000 human quality ratings. Further, we devise an objective VQA model called Space-Time GeneRalizEd Entropic Difference (GREED) which analyzes the statistics of spatial and temporal band-pass video coefficients. A generalized Gaussian distribution (GGD) is used to model band-pass responses, while entropy variations between reference and distorted videos under the GGD model are used to capture video quality variations arising from frame rate changes. The entropic differences are calculated across multiple temporal and spatial subbands, and merged using a learned regressor. We show through extensive experiments that GREED achieves state-of-the-art performance on the LIVE-YT-HFR Database when compared with existing VQA models. Perceptual image and video quality assessment (IQA/VQA) is an integral component for many social media and streaming platforms. We consider the problem of learning perceptually relevant quality representations in a selfsupervised manner. Distortion type identification and degradation level determination is employed as an auxiliary task to train a deep model using unlabeled data. The model is trained using a contrastive loss and we refer to this training framework and resulting model as CONTRastive Image QUality Evaluator (CONTRIQUE) and CONtrastive VIdeo Quality EstimaTor (CONVIQT) for image and video respectively. During testing, the weights of the trained model are frozen, a linear regressor maps the learned features to quality scores in a no-reference (NR) setting. We conduct comprehensive evaluations of the proposed model on multiple IQA and VQA databases by analyzing the correlations between model predictions and ground-truth quality ratings, and achieve competitive performance when compared to state-of-the-art NR models. The learned representations are highly robust and generalize well across images and videos afflicted by either synthetic or authentic distortions. Our results suggest that powerful quality representations with perceptual relevance can be obtained without requiring large labeled subjective image/video quality datasets.Item Phoneme segmentation using self-supervised speech models(2023-08) Strgar, Luke Vincent; Harwath, DavidWe apply transfer learning to the task of phoneme segmentation and demonstrate the utility of representations learned in self-supervised pre-training for the task. Our model extends transformer-style encoders with strategically placed convolutions that manipulate features learned in pre-training. Using the TIMIT and Buckeye corpora we train and test the model in the supervised and unsupervised settings. The latter case is accomplished by furnishing a noisy label-set with the predictions of a separate model, it having been trained in an unsupervised fashion. Results indicate our model eclipses previous state-of-the-art performance in both settings and on both datasets. Finally, following observations during published code review and attempts to reproduce past segmentation results, we find a need to disambiguate the definition and implementation of widely-used evaluation metrics. We resolve this ambiguity by delineating two distinct evaluation schemes and describing their nuances. We provide a publicly available implementation of our work on Github.Item Unsupervised fine-tuning data selection for ASR using self-supervised speech models(2022-12-01) Goudy, Reem A.; Harwath, DavidSelf-supervised learning (SSL) has been able to leverage unlabeled data to boost the performance of automatic speech recognition (ASR) models when we have access to only a small amount of transcribed speech data. However, this raises the question of which subset of the available unlabeled data should be selected for transcription. Our work investigates different unsupervised data selection techniques for fine-tuning the HuBERT model under a limited transcription budget. We investigate the impact of speaker diversity, gender bias, and topic diversity on the downstream ASR performance. We also devise two novel techniques for unsupervised data selection: pre-training loss based data selection and the perplexity of byte pair encoded clustered units (PBPE) and we show how these techniques compare to pure data selection. Finally, we analyze the correlations between the inherent characteristics of the selected fine-tuning subsets as how these characteristics correlate with the resultant word error rate. We demonstrate the importance of token diversity, speaker diversity, and topic diversity in achieving the best performance in terms of WER.