Presentation
Search Abstracts | Symposia | Slide Sessions | Poster Sessions | Lightning Talks
Multisensory enhancement of cortical speech tracking by co-speech gesture kinematics
Poster B113 in Poster Session B, Tuesday, October 24, 3:30 - 5:15 pm CEST, Espace Vieux-Port
Jacob Momsen1, Seana Coulson; 1SDSU/UCSD JDP in Language and Communicative Disorders, 2UC San Diego
The McGurk effect indicates visual processing of articulatory information impacts how speech sounds are heard and suggests the processing of spoken language is fundamentally a multimodal affair. However, the observable activity that accompanies speech also involves coordinated movements across the entire body. Compared to the way articulatory activity can influence speech perception, the relationship between co-speech gesture and speech acoustics is less transparent. The present study aims to establish whether biological motion in co-speech gestures influences neural signatures of speech tracking. Specifically, we test whether continuous speech is processed independently from information in the co-speech gestures, or whether biological motion information in co-speech gestures can enhance the processing of continuous speech. EEG was recorded from 13 English-speaking adults while they observed clips of unscripted discourse containing either original audio and visual content (Congruent), recombined audiovisual content from unrelated discourse segments (Incongruent), visual content without sound (Video Only), or audio paired with a still-frame of the speaker (Audio Only). Decoder models were trained on the EEG across conditions to predict the broadband envelope of the speech signal. A leave-one-out cross-validation procedure was performed to render a decoding score, viz. the Pearson correlation coefficient for the predicted and actual speech envelope in each clip in each of the four conditions. Linear mixed effects regression models with random intercept terms for subject and item were used to predict decoding scores from experimental condition. This analysis revealed that the neural representation of the speech envelope was more precise when speech was paired with congruent gestures relative to a listening-only condition (β= 0.02; SE=0.007; p<0.05). Alternatively, performance of decoders trained on incongruent speech-gesture pairings did not differ from those trained in the Audio Only condition. As expected, speech reconstruction success was also poorer when gestures were presented in silence (β = -0.05; SE=0.007; p<0.001). To assess whether congruent gestures lead to multisensory enhancement of speech-tracking, reconstruction performance was compared between additive decoder models trained on both unimodal conditions and those trained on trials in the Congruent condition. This analysis revealed the additive model trained on Audio Only and Video Only trials performed worse than decoder models trained on congruent speech gesture pairings (β = -0.03; SE=0.007; p<0.001). This result indicates the presence of congruent gestures led to a non-linear enhancement of speech envelope tracking relative to unimodal processing of speech and co-speech gesture information alone. Extant work on audiovisual speech processing indicates that visual articulatory information from the face enhances the cortical representation of continuous speech via non-linear multisensory enhancement of speech encoding. Our results point to an analogous super-additive enhancement of the cortical representation of speech when speech is paired with congruent co-speech gestures. This suggests visual information about the talker’s movements affects the fidelity of speech tracking in the auditory cortex. The temporal coherence between co-speech gesture kinematics and continuous speech may result in the two signals being perceptually bound to form a multisensory representation that allows visuospatial information conveyed in biological motion to influence the perceptual uptake of speech.
Topic Areas: Signed Language and Gesture, Multisensory or Sensorimotor Integration