Presentation

Search Abstracts | Symposia | Slide Sessions | Poster Sessions | Lightning Talks

Language model predictability and word frequency effects on fixation-related N400s during Chinese natural article reading

Poster E120 in Poster Session E, Thursday, October 26, 10:15 am - 12:00 pm CEST, Espace Vieux-Port

Pei-Chun Chao1, Jou-An Chung2,3, Chia-Ju Chou4, Jie-Li Tsai5,6, Chia-Ying Lee1,2,3,6; 1Academia Sinica, Taipei, Taiwan, 2National Central University, Taoyuan, Taiwan, 3Taiwan International Graduate Program, Academia Sinica, Taipei, Taiwan, 4Cardinal Tien Hospital, Taipei, Taiwan, 5National Chengchi University, Taipei, Taiwan, 6Research Center for Mind, Brain, and Learning, National Chengchi University, Taipei, Taiwan

Word predictability and frequency effects on N400s have been used to indicate top-down contextual prediction and bottom-up lexical retrieval processes during reading comprehension. Previous fixation-related potential (FRP) studies have revealed their interaction effects on N400s in the natural sentence and article reading. The decreasing N400 amplitude with augmenting predictability only occurred in infrequent words. These findings suggested that a highly predictive context benefits word identification. Word predictability is commonly quantified using cloze probability (CP), which refers to the proportion of people offering a particular continuation to complete its preceding context. However, the subjective cloze procedure is laborious, and CPs are hardly generalized to other new materials. For expediting testing processes and validating CPs across contexts, recent studies trained language models (LMs) to automatically derive words’ probability from natural language corpora. The LM-based CPs can explain above half of the change in human CPs. Therefore, this study aimed to examine whether LM predictions well account for N400 responses as human predictions in natural reading by simultaneously recording eye movement and fixation-related potential (EMFRP). The EMFRP data were collected from forty-seven participants while reading 2504 words of twenty-two articles in traditional Chinese from the Academia Sinica Article Corpus. The word predictability was estimated from a cloze test completed by 32 readers and the BERT language model. The word frequency was computed as occurrences per million in log-transformed format from the Academia Sinica Corpus of Contemporary Taiwan Mandarin (ASCCTM). We applied linear mixed-effects models (LMMs) to examine the effects of word frequency and predictability from human and BERT-based CPs on N400s. The LMMs for single-trial FRPs with two-character content words were performed on the mean N400 amplitudes between 375-475 ms. The LMMs include the participants, article numbers, and word items as random effects and the word stroke, launch site, position, frequency, predictability, and frequency-predictability interaction as fixed effects. Pearson correlation revealed that BERT-based CPs were significantly positively related to human CPs (r = 0.63, p < 0.0001). Moreover, the facilitative impacts of BERT-based CPs on N400s with central-parietal distribution are comparable to human CPs. Specifically, both LMM results of human and BERT-based CPs showed a significant predictability-frequency interaction effect (Human t = -2.907; BERT: t = -2.322), a significant word predictability effect (Human: t = 4.195; BERT: t = 3.216) and a null effect of word frequency (Human: t = 1.057; BERT: t = 0.457). A greater N400 reduction with human and BERT-based CP was restricted to low-frequency words. That is, low-frequency words had enormous benefits from contextual clues. These findings suggest that BERT-estimated probabilities are a good substitute for human-generated CPs due to their strong correlation and analogous brain responses.

Topic Areas: Reading,

SNL Account Login

Forgot Password?
Create an Account

News