Optimizing visual grounding of latent representations of speech from distant language groups

dc.contributor.advisorHarwath, David
dc.creatorCrabtree, Christopher Edwin
dc.date.accessioned2022-11-08T23:58:10Z
dc.date.available2022-11-08T23:58:10Z
dc.date.created2021-12
dc.date.issued2021-12-03
dc.date.submittedDecember 2021
dc.date.updated2022-11-08T23:58:11Z
dc.description.abstractRecent years have seen an increasing research interest into using multi-modal grounding techniques to bolster classic natural language processing (NLP) and automated speech recognition (ASR) tasks. Previous work by Harwath et al. [5], demonstrated that visual grounding approximately doubled their model's bilingual utterance retrieval performance and similarly image retrieval was substantially improved by adding an alignment objective between languages. However, there is still much we don't know about the exact mechanism by which grounding is used in modern neural network systems. In this work, we extend the line of research pioneered by Harwath et al. by exploring empirically several contrastive learning frameworks and objectives designed to align input from different modalities (i.e. visual and speech input). Our experiments indicate potential avenues for improvement over the current best performing loss objective through analysis of our top two performing loss functions. We also find that in our trilingual setting, cross-lingual learning objectives can be removed to both improve image retrieval performance and reduce hyperparameter complexity
dc.description.departmentComputer Science
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://hdl.handle.net/2152/116594
dc.identifier.urihttp://dx.doi.org/10.26153/tsw/43489
dc.subjectImage retrieval
dc.subjectLoss functions
dc.titleOptimizing visual grounding of latent representations of speech from distant language groups
dc.typeThesis
dc.type.materialtext
thesis.degree.departmentComputer Sciences
thesis.degree.disciplineComputer Science
thesis.degree.grantorThe University of Texas at Austin
thesis.degree.levelMasters
thesis.degree.nameMaster of Science in Computer Sciences

Access full-text files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
CRABTREE-THESIS-2021.pdf
Size:
305 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
4.46 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.85 KB
Format:
Plain Text
Description: