Geometry-aware multi-task learning for binaural audio generation from video

dc.contributor.advisorGrauman, Kristen Lorraine, 1979-
dc.creatorGarg, Rishabh
dc.date.accessioned2021-09-07T19:18:05Z
dc.date.available2021-09-07T19:18:05Z
dc.date.created2021-05
dc.date.issued2021-05-07
dc.date.submittedMay 2021
dc.date.updated2021-09-07T19:18:05Z
dc.description.abstractHuman audio perception is inherently spatial and videos with binaural audio simulate the spatial experience by delivering different sounds to both ears. However, videos are typically recorded with mono audio and hence generally do not offer the rich audio experience of binaural audio. We propose an audio spatialization method that uses the visual information in videos to convert mono audio to binaural. We leverage the spatial and geometric information about the audio present in the visuals of the video to guide the learning process. We learn these geometry aware features in visuals in a multi-task manner to generate rich binaural audio. We also generate a large video dataset with binaural audio in photorealistic environments to better understand and evaluate the task. We demonstrate the efficacy of our method to generate better binaural audio by learning more spatially coherent visual features by extensive evaluation on two datasets.
dc.description.departmentComputer Science
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://hdl.handle.net/2152/87490
dc.identifier.urihttp://dx.doi.org/10.26153/tsw/14434
dc.subjectAudio-visual
dc.subjectBinaural audio
dc.subjectComputer vision
dc.subjectMachine learning
dc.titleGeometry-aware multi-task learning for binaural audio generation from video
dc.typeThesis
dc.type.materialtext
thesis.degree.departmentComputer Sciences
thesis.degree.disciplineComputer Science
thesis.degree.grantorThe University of Texas at Austin
thesis.degree.levelMasters
thesis.degree.nameMaster of Science in Computer Sciences

Access full-text files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
GARG-THESIS-2021.pdf
Size:
4.39 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
4.45 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.84 KB
Format:
Plain Text
Description: