Search Abstracts | Symposia | Slide Sessions | Poster Sessions
Slide Session A
Thursday, October 24, 3:30 - 4:30 pm, Great Hall 1
Chair: Daniela Sammler, Max Planck Institute for Empirical Aesthetics, Frankfurt/Main
Talk 1: Neural encoding of word forms from continuous speech in human cortex
Yizhen Zhang1, Matthew Leonard1, Laura Gwilliams2, Ilina Bhaya-Grossman1,3, Edward Chang1; 1University of California San Francisco, 2Stanford University, 3University of California Berkeley
When listening to speech, our ears receive a continuous stream of sound vibrations, yet our brains perceive a sequence of discrete and meaningful linguistic units - words. Words serve as a crucial link between sound and meaning, however, it remains largely unknown how the speech cortex extracts and represents individual words from natural continuous speech. Here, we recorded high-density electrocorticography (ECoG) responses while participants passively listened to spoken narratives. Our findings revealed the neural encoding of word forms as a distributed and dynamic representation in the human speech cortex. First, we examined neural activity aligned to word boundaries. We found neural populations throughout the superior temporal gyrus (STG) elicited time-locked evoked responses. These responses were distinct from other boundaries like syllables, which are cued with similar acoustic features (i.e., changes in the amplitude envelope). The word-evoked response was multi-phasic, consisting of 1-3 distinct states around each word boundary. To address the functional significance of each phase of the evoked response, we used partial correlation to test the extent to which key speech and language features (e.g., acoustic-phonetic, prosodic, lexical) are encoded at each moment. We found that each part of the response encodes distinct information, with a characteristic temporal sequence: immediately before the boundary, information about the duration of the previous word modulates neural activity, followed by a peak just after the boundary that encodes acoustic-phonetic and prosodic (e.g., vowel length) cues, and finally a peak in the middle of the word that encodes lexical properties like word frequency. This sequence of feature encoding, anchored at word boundaries in continuous speech, is highly robust to the large variation in word length, which is enabled by a population code for relative time. Finally, word extraction and representation spatially overlap with the acoustic-phonetic feature encoding in the mid-STG, with additional neural populations encoding word-level information extending in both anterior and posterior directions along the gyrus. Together, these results suggest that extracting words from continuous speech relies on a distributed neural code, which jointly represents temporal context and speech content to integrate phonological and lexical features within a rapid cycle.
Talk 2: Cortical representation of reading comprehension in English
Xue Gong1, Cong Du1, Catherine Chen1, Christine Tseng1, Frederic Theunissen1, Jack Gallant1, Fatma Deniz2; 1UC Berkeley, 2Technische Universität Berlin
Reading comprehension, a specialized form of object recognition, requires extracting meaning from written texts through a set of dynamic and intermediate representations. Previous research indicates that the visual word form area (VWFA) in the left ventral occipitotemporal cortex is crucial for the invariant recognition of written words (Cohen et al, 2000, Dehaene et al, 2010). However, the specifics of how the brain processes reading comprehension remain less understood. To investigate the intermediate representations of reading throughout the human cerebral cortex, we used functional MRI and voxelwise encoding models. We collected BOLD activity when nine participants read over two hours of engaging natural narrative stories. Firstly, we build a voxelwise encoding model using eleven feature spaces based on language statistics (e.g. word rate), visual, phonemic, orthographic, semantic and syntactic information of the story stimuli. Secondly, because visual, orthographic and semantic features can be highly correlated in reading, we used variance-partitioning analysis to determine how much of the variance was uniquely predicted by each feature space and the combination of these features (Gong et al, 2023; de Heer et al, 2017; Lescroart et al., 2015). Lastly, we examined the tuning of visual and orthographic properties in the cerebral cortex by projecting the high dimensional model weights onto a low dimensional and interpretable space using principal component analysis. Our results provide three lines of evidence for representations of reading comprehension. First, visual, orthographic and semantic feature spaces are the three major dimensions in representing reading related information in the cerebral cortex. In particular, the human cerebral cortex does not represent features of reading comprehension in clearly segregated and distinct areas. Instead, various cortical areas simultaneously represent a combination of these features. Secondly, as the cerebral cortex extracts meaning from written words, voxels in early to high-level visual areas gradually become tuned to lower spatial frequencies and higher temporal frequencies. These results suggest a visual processing of increased sensitivity to temporal variation and a progressive integration across spatial areas. Thirdly, we identified the posterior inferior temporal gyrus (pITG) and posterior fusiform gyrus (pFG) to represent orthographic information. The first principal dimension of the orthographic model weights separates linear letters (i, j, l) from curved letters (a, c, s). The second principal dimension separates letters with intersections (x, y, w) from the letters without intersections (o, c, s). This result indicates that the human cerebral cortex represents English letters orthographically according to their curvature and intersections.
Talk 3: White Matter Networks Differentially Mediate Language Cognition by Semantic Demand and Improved Responses after TMS Non-Invasive Brain Stimulation
Shreya Parchure1, John Medaglia2, Denise Harvey1, Apoorva Kelkar2, Dani Bassett1, Roy Hamilton1; 1University of Pennsylvania, 2Drexel University
Semantic cognition - understanding the context and meaning of words - is critical for language production, and its impairment in aphasia patients profoundly disrupts daily living. Understanding neural mechanisms of semantic processing is key for effective brain stimulation treatments. While singular brain regions underlying language are well-studied, little is known about how their interactions (i.e. white matter tract networks) contribute to speech generation. White matter networks mediate distribution of potential treatments like transcranial magnetic stimulation (TMS), non-invasive neuromodulation that temporarily depresses cortical excitability and influences behavior. Theories of anatomical basis of semantic cognition implicate various tracts: connecting left inferior frontal gyrus (LIFG), dual-stream model in peri-sylvian fissure, to a broad language network comprising nearly entire brain. To elucidate the anatomical basis of semantic cognition, we used network neuroscience models of these theories paired with experiments using focal manipulation by TMS during tasks with varying semantic demands. N=31 English-speaking healthy adults received either active or sham rTMS over LIFG. Before and after stimulation, response times (RTs) were collected for 2 spoken word completion tasks: Verb generation and Sentence completion. Each subject underwent structural MRI, and networks were created of white matter streamlines connecting cortical brain regions. Sub-networks corresponding to theorized language models were constructed for each subject: 1. Left inferior fronto-occipital fasciculus (IFOF), 2. Inferior longitudinal fasciculus (ILF), 3. Uncinate fasciculus (UF), and 4. Peri-sylvian language network (LangNet). Each of their connectivity, measured as network strength, were used as predictors in linear mixed effects regression with log(RTs) as behavioral outcome measure, before and after cTBS. At baseline for Verb generation, IFOF (p<0.001) and ILF (p=0.015) were significant predictors of RTs. Whereas for Sentence completion, UF (p=0.004) and LangNet (p<0.001) significantly predict RTs. After rTMS, there was a significant decrease in Sentence completion RTs for active stimulation recipients (p=0.017, pairwise t-test) but not in sham (p=0.59). For Verb Generation, there was no significant difference from baseline RTs; IFOF (p=0.007) and ILF (p<0.001) remained significant mediators even after rTMS. Additionally for Sentence Completion, there was a change in white matter predictors: IFOF network strength was newly predictive (p<0.001) of the faster post-stimulation RTs, while UF and LangNet were no longer predictive (p>0.05). Double dissociation between networks predicting Verb generation (IFOF and ILF) and Sentence completion (UF and LangNet) at baseline, suggests differential recruitment of white matter tracts according to semantic demands of language task. Results also corroborate fMRI studies on the role of IFOF in semantic cognition and syntactic comprehension, and for IFOF and ILF but not UF in Verb generation. These methods are generalizable to many cognitive processes organized in brain networks that contribute to complex human behavior. Further, focal stimulation over LIFG (nearest UF tract) produced faster RTs only for the task which was associated with UF. Change in its significant predictors post-rTMS (from UF to IFOF) suggest Sentence completion RT changes may be mediated by disruption of UF and recruitment of alternate tract IFOF. This work produces novel insight into neural organization of semantic cognition and language dynamics after neuromodulation.
Talk 4: Age-Appropriate Large Language Models and EEG Encoding Models Reveal Contextual Lexical Processing across the First Five Years of Life
Katharina Menn1, Claudia Männel1,2, Florian Scharf3, Hanna Woloszyn4, Benjamin Gagl4, Lars Meyer1,5; 1Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany, 2Charité - Universitätsmedizin Berlin, Germany, 3Universität Kassel, Germany, 4Universität zu Köln, Germany, 5University Hospital Münster, Germany
Language acquisition entails rapid learning. Segmentation and comprehension of single words emerge within the first year of life (Bergelson & Swingley, 2012; Jusczyk et al., 1999). While there is also evidence that contextual processing emerges within the second year of life (Friedrich & Friederici, 2005), the earliest stages of contextual lexical processing remain elusive—due to infants’ limited attention and restricted response capacities. Here, we combine age-appropriate Large Language Models (LLMs; Schepens et al., 2023) with EEG encoding models (TRFs; Crosse et al., 2015) to reveal contextual lexical processing across the first five years of life, analyzing a quasi-longitudinal developmental dataset of naturalistic speech processing. Our sample consists of n = 51 children (31 female) aged between 3 months and 5 years, with age distributed uniformly across the age range. Each child was assessed twice within a 3-month time window. During each session, we recorded children’s EEG while they heard translation-equivalent stories in their native language (German) and an unfamiliar language (French). Children’s electrophysiological responses to the individual words in the story were quantified using EEG encoding models. First, we used TRFs to capture word onset responses as a temporal search space for native lexical processing. Electrophysiological responses to words increase with age (t = 2.34, p = .021) in the native language (German), but not the unfamiliar baseline (French), in three distinct time windows (100–300/300–450/450-600 ms). To estimate whether word onset responses are related to lexical processing, we employed LLMs (GPT-3.5) to generate age-appropriate text corpora and estimate lexical frequencies and contextual lexical predictabilities of the words in our stimuli specifically for German-learning children (Schepens et al., 2023). Mixed-effects modeling demonstrates that all EEG word responses also contain variance that indicates contextual lexical processing: Amplitudes between 100–300 ms increase with word frequency (t = 3.14 , p < .001), an effect attenuated by age (t = –2.22, p = .026). Amplitudes between 300–450 ms decrease with word frequency (t = –2.26, p = .024). Between 450–600 ms, amplitudes are modulated by age-appropriate lexical predictability (t = –4.88, p < .001). Our findings reveal early electrophysiological sensitivity not only to individual words, but to words in context. While our preliminary findings cannot dissociate form– and content-level processing (i.e., the lexical and semantic levels), they suggest that age-appropriate LLMs may be a critical looking glass for studying the early emergence of the mental lexicon and contextual lexical processing with non-invasive electrophysiological recordings.