Neural speech tracking in quiet and noisy listening environments using naturalistic stimuli
In noisy situations, speech may be masked with conflicting acoustics, including background noise from the environment or other competing talkers. The process of listening to one stream of sounds while ignoring background noise is referred to as the “cocktail party problem,” but its physiological basis remains poorly understood. In this study, we used electroencephalography (EEG) to measure neural responses to a continuous, controlled clean speech stimulus versus speech in naturalistic noise in 17 participants with typical hearing. We employed linear encoding models to assess the degree of neural tracking to specific speech features. These models allow us to predict neural activity from EEG based on specific acoustic or linguistic features in the speech stimulus over time. The aims of this project were the following: 1) assess the fidelity of neural tracking of speech features using a highly uncontrolled and naturalistic stimulus containing speech-in-noise alongside a clean speech condition, 2) characterize neural responses to acoustic features such as the speech envelope and pitch, along with linguistic features such as phonological features in both speech-in-noise and speech alone stimuli, 3) utilize a cross-prediction analysis to predict the neural responses to a speech-in-noise condition from a clean speech condition, and vice versa. The first two analyses seek to understand which speech features drive brain responses measured from the scalp. The purpose of the third analysis is to understand whether the predictions from our encoding model are generalizable to different types of stimuli. Our results demonstrated that model performance was more robust for the phonological features compared to the acoustic envelope in clean speech conditions, but combining acoustic and phonological features aided in listeners tracking speech in a noisy condition. Our ability to predict neural activity in response to speech sounds was higher when those sounds occurred without background noise. Finally, we predicted responses to the clean speech stimuli based on responses to the noisy speech stimuli, and vice versa. These results have implications for identifying which speech features could be used to build a brain-machine interface or a cognitive hearing aid to identify and separate speech from noise.