Presentation

Search Abstracts | Symposia | Slide Sessions | Poster Sessions | Poster Slams

Intelligibility of Audiovisual Speech Drives Multivoxel Response Patterns in Human Superior Temporal Cortex for Words and Sentences

Poster C73 in Poster Session C, Friday, October 7, 10:15 am - 12:00 pm EDT, Millennium Hall

Yue Zhang1, John Magnotti1, Johannes Rennig2, Michael Beauchamp1; 1University of Pennsylvania, 2University of Tübingen

Regions of the human posterior superior temporal gyrus and sulcus (pSTG/S) respond to the visual mouth movements that constitute visual speech and the auditory vocalizations that constitute auditory speech. Neural responses in pSTG/S may underlie the perceptual benefit of visual speech for the comprehension of noisy auditory speech. We examined this possibility through the lens of multivoxel pattern responses in pSTG/S. BOLD fMRI data was collected with from 37 participants. Stimuli consisted of sentences or single words presented in five formats: clear auditory speech paired with a video of a talking face (AcV); noisy auditory speech with a face video (AnV); clear auditory-only (Ac); noisy auditory-only (An); and visual-only (V). Following the presentation of each item, participants rated intelligibility with a button press. Noisy speech was often rated as intelligible, but only if it was paired with a face video (mean of 76% with face video, 45% without). For these conditions, the fMRI data was post hoc sorted into intelligible and unintelligible trials. In each hemisphere, a region of interest in pSTG/S was localized. Then, the mean percent signal change of each voxel in the ROI to each condition was calculated. The mean percentage signal change across conditions was calculated for each voxel and subtracted from the response to each individual condition to increase the dynamic range of the fMRI pattern correlation. Within each hemisphere, the fMRI pattern similarity between each pair of conditions was calculated by correlating the mean-centered percent signal change across all voxels in the ROI. The pairwise correlations were averaged across hemispheres. The patterns evoked in pSTG/S by physically-similar noisy audiovisual speech differed, depending on intelligibility. The response pattern to intelligible AnV speech was more similar to that evoked AcV speech (mean r = 0.38) while the response pattern to unintelligible AnV speech was less similar to that of AcV speech (mean r = -0.09). The cross-correlations were Fisher z-transformed and entered into a linear mixed-effects model. There were main effects of intelligibility (p = 10-15) and stimulus type, with a stronger intelligibility effect for words than sentences (p = 10-6), without a significant interaction. To visualize the pairwise correlations, multidimensional scaling (MDS) was applied to the average correlation matrix for sentences and words. The MDS for sentences and words were qualitatively similar. Plotting the pairwise correlations for correlation ranks across words and sentences against each other showed a significant positive correlation of correlation ranks, r = 0.68; p = .0008. Seeing the face of the talker significantly improves the perception of noisy speech. Across two independent experiments using single word or sentences, we found that noisy but intelligible audiovisual speech evoked brain activation patterns in pSTG/S similar to those of clear audiovisual speech. The successful integration of visual and auditory speech produces a characteristic neural signature in pSTG/S, highlighting the importance of this region in generating the perceptual benefit of visual speech.

Topic Areas: Perception: Speech Perception and Audiovisual Integration, Speech Perception