Presentation

Search Abstracts | Symposia | Slide Sessions | Poster Sessions | Lightning Talks

Segmenting words from continuous speech in human temporal cortex

Poster C89 in Poster Session C, Wednesday, October 25, 10:15 am - 12:00 pm CEST, Espace Vieux-Port

Yizhen Zhang1, Laura Gwilliams1, Ilina Bhaya-Grossman1,2, Matthew Leonard1, Edward Chang1; 1University of California, San Francisco, 2University of California Berkeley

Understanding spoken language requires extracting individual words from a continuous acoustic speech signal. Unlike written text, detecting word boundaries in spoken language is challenging because words are often not separated by silence and acoustic cues are not reliable. Thus, listeners may instead take advantage of their speech experience, segmenting words using multiple sources of learned knowledge. An outstanding question is how the brain extracts words from the speech stream; specifically it is unknown which brain areas encode word boundaries and whether those representations are independently or jointly encoded with lexical information. To address this, neural recordings with high spatial and temporal resolution are required to dissect the local cortical computations that are selective to specific acoustic, phonetic, and lexical properties. Here, we recorded high-density electrocorticography (ECoG) responses while participants passively listened to spoken narratives, and investigated the process by which the brain segments words in natural speech. We first explored whether neural populations are sensitive to word boundaries in single trials. We found neural populations throughout the lateral temporal cortex had evoked responses time-locked to word boundaries. Specific electrodes exhibit complex, multi-phasic evoked responses, consisting of 1-3 distinct response peaks around each word boundary. We used partial correlation to show that both acoustic cues and word-level features modulated the word boundary response. Specifically, we observed a sequence of feature encoding around word boundaries: envelope cues occurred immediately after the word onset, followed by sensitivity to lexical frequency, and finally the duration of the whole word. With regard to spatial localization, acoustic-phonetic features were primarily encoded in the middle superior temporal gyrus (STG), while word-level features were encoded in the middle STG as well as the surrounding cortex in anterior and posterior STG. A widely distributed STG neural population jointly encoded multiple levels of features, and that neural population also exhibited superior word segmentation performance compared to the electrodes that exclusively encoded acoustic-phonetic or word-level features. Together, these findings suggest that the human STG is sensitive to word boundaries, with acoustic (envelope) cues and lexical features (frequency and duration) jointly contributing to the word segmentation process. The core middle STG appears to encode the acoustic-phonetic inputs, whereas lexical encoding is both in the middle STG as well as in surrounding cortical regions. These results support a new model of distributed, integrative processing in the STG during spoken word processing.

Topic Areas: Speech Perception,

SNL Account Login

Forgot Password?
Create an Account

News