Presentation
Search Abstracts | Symposia | Slide Sessions | Poster Sessions | Lightning Talks
Deep neural networks for sound classification reveal representations of natural sounds from intracerebral responses in human auditory cortex
There is a Poster PDF for this presentation, but you must be a current member or registered to attend SNL 2023 to view it. Please go to your Account Home page to register.
Poster B98 in Poster Session B, Tuesday, October 24, 3:30 - 5:15 pm CEST, Espace Vieux-Port
Kyle Rupp1, Taylor Abel1; 1University of Pittsburgh
The human auditory system is extremely adept at parsing auditory scenes into individual components, analyzing their constituent acoustic features, and sorting those components into sound categories. Many studies have attempted to explain the underlying neural representations that facilitate this transformation from a continuous and variable acoustic signal to category-level endpoints. However, this work has largely focused on using human-defined stimulus features, or in some cases using matrix decomposition techniques that explain neural responses to a set of sounds as a weighted sum across a latent low-dimensional stimulus space. Both approaches fall short in capturing the rich, complex stimulus transformations that must exist to solve the sound categorization problem. Meanwhile, recent machine learning advances have produced novel deep neural network (DNN) models that solve this exact problem, with relatively few constraints on the specific stimulus transformations that the models can use. Assuming that the models have naturally identified an optimal set of stimulus features to categorize sounds, and that the stimulus representations within the model grow increasingly complex and abstract with increasing layer depth, we can view it as a data-driven feature extractor with representations spanning the range from low-level acoustics to abstract category-level descriptions. Guided by this framework, we built encoding models to predict neural responses in auditory cortex using layer activations within a sound categorization DNN as input features, which we refer to as DNN-derived encoding models. Neural data was recorded via stereoelectroencephalography (sEEG) in 16 patient-participants while they listened to a set of 165 two-second clips of natural sounds from categories including speech, non-speech vocalizations, music, and environmental sounds. We were able to predict neural responses with state-of-the-art accuracy; furthermore, the best predictions came from shallower DNN layers for supratemporal plane (STP) channels and deeper layers for channels in superior temporal gyrus and superior temporal sulcus (STG/S). DNN-derived encoding models consistently outperformed spectrotemporal receptive field models, suggesting that all channels, including those in posteromedial Heschl’s gyrus, encoded more complex representations than simple spectrotemporal tuning. Furthermore, a measure of the category encoding strength for human vocalizations (as determined by a separate analysis) was highly positively correlated with the best DNN layer across channels, demonstrating that channels traditionally described as voice category-selective were most closely associated with deep DNN layers. We then used the DNN-derived encoding models to estimate integration windows by identifying the shortest stimulus inputs that did not appreciably change the predicted neural responses; a clear anatomical segregation emerged, with integration windows of ~85-185 ms for STP channels and ~245-335 ms in STG/S. These results further elucidate the functional properties within subregions of auditory cortex: STP encodes acoustic properties (albeit with higher complexity than spectrotemporal tuning) at short timescales, while STG/S integrates over longer timescales to encode higher order stimulus transformations more akin to voice category selectivity.
Topic Areas: Computational Approaches,