Acoustic field prediction for continuous audio-visual navigation with interaction-free learning

Date

2023-12

Authors

Ramos Chen, Jordi

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

This thesis presents explores audio-visual navigation. The primary focus is on AudioGoal navigation, where an agent identifies and moves toward an audio source in an unmapped environment, a functionality crucial for applications such as search and rescue operations and object retrieval. Moreover, this research addresses real-world scenarios, such as locating a sounding alarm or responding to human voices. Traditional approaches in this domain often rely on reinforcement learning, facing challenges in interpretation, modification, and transferability to real-world scenarios. To address these challenges, we propose a novel hierarchical approach that disentangles complex end-to-end navigation tasks into more manageable, step-by-step supervised learning tasks. Central to this approach is the creation of an acoustic field dataset and an acoustic field prediction model, which enables agents to predict acoustic fields indicating sound intensity around them. This prediction aids in hierarchical waypoint navigation, significantly improving performance over baseline models in simulated AudioGoal tasks. Our approach utilizes SoundSpaces 2.0, a simulation platform enabling audio-visual navigation with audio rendering in real-time. This form of audio-visual navigation closely replicates the acoustic qualities present in real-world environments. The key contributions are two-fold: our definition of the acoustic field along with the associated acoustic field dataset that clarifies and simplifies the navigation process, and the implementation of a hierarchical navigation strategy that leverages acoustic field predictions to surpass current baselines. While the direct application to real-world robots falls outside the scope of this thesis, preliminary efforts indicate promising directions for future research and practical applications in training and testing AudioGoal navigation on real robots.

Description

LCSH Subject Headings

Citation