Presentation

Search Abstracts | Symposia | Slide Sessions | Poster Sessions | Lightning Talks

iEEG pattern similarity in the superior temporal cortex predicts differences across individual words in noisy audiovisual speech perception

Poster C83 in Poster Session C, Wednesday, October 25, 10:15 am - 12:00 pm CEST, Espace Vieux-Port

John Magnotti1, Yue Zhang1, Xiang Zhang1, Aayushi Sangani1, Zhengjia Wang1, Michael Beauchamp1; 1University of Pennsylvania

Humans have the unique ability to decode the rapid stream of language elements that constitute speech. Although auditory noise in the environment interferes with speech perception, perceivers can partially compensate using visual information from the face of the talker. Individual words vary greatly in the visual information they contain. While previous research has shown that the superior temporal cortex is a locus for audiovisual integration and speech perception, the neural basis for perceptual differences across words remains poorly understood. Participants (N = 15) were patients with intractable epilepsy undergoing clinical monitoring in the epilepsy monitoring unit. We measured intracranial from stereotactic EEG (sEEG) electrodes in the superior temporal cortex. Words were presented in four different formats: clear auditory-only (Ac) clear audiovisual (AcV), noisy auditory-only (An), and noisy audiovisual (AnV). Noise consisted of pink noise, with an SnR of -8 dB. Each patient was presented with 110 unique words with a balanced content of different phonemes and visemes. The words were counterbalanced so that across participants, each word was presented in every format. Participants repeated back each word after presentation. The response was recorded and scored. 140 electrodes in superior temporal cortex electrodes were identified that showed a significant (p < 0.001 Bonferroni-corrected) response to clear auditory. We measured the percent increase in 70-150Hz broadband high frequency activity (BHA) in a window from 0 ms to 1000 ms after auditory onset, compared to a baseline window from -1000 ms to 0 ms before auditory onset. The BHA timecourse was sampled at 10 ms to produce 100 timepoints for each word/condition pair. The data from each electrode was z-normalized, and then the data across all electrodes was averaged to produce a single timecourse for each word/condition pair. Then, the timepoint-by-timepoint response to each individual word was correlated across different stimulus formats (e.g. the response to a word presented in the AnV format was correlated with the response to the same word presented in AcV format). As expected, seeing the face of the talker was beneficial (mean % correct of 31% in A-only vs. 65% in AnV). There was high variability across words in the visual benefit (mean improvement of 34% +- 29% SD). Word-level differences in audiovisual improvement were predicted by neural pattern similarity: when the response to a noisy audiovisual word was more similar to the response to the clear version of that word, perceptual intelligibility was high (r = 0.43, p = 10^-6). Using iEEG to measure neural activity in the superior temporal cortex, we found that the neural pattern similarity between clear vs. noisy audiovisual words reliably predicted the degree of audiovisual benefit for that word, presumably as a result of the neural integration of the viseme and phoneme content of each word. Enhancing our understanding of the neural substrates of noisy speech perception may help in the design and testing of speech perception aids and other speech-assistive technologies.

Topic Areas: Speech Perception, Multisensory or Sensorimotor Integration

SNL Account Login

Forgot Password?
Create an Account

News