Presentation

Search Abstracts | Symposia | Slide Sessions | Poster Sessions

Reanalysis of Five fMRI language datasets using Representational Similarity Analysis

Poster Session A - Sandbox Series, Thursday, October 24, 10:00 - 11:30 am, Great Hall 3 and 4
This poster is part of the Sandbox Series.

James Fodor¹; ¹The University of Melbourne

Recent years have seen rapid progress in the development of distributed semantic representations, in which the meaning of a word, sentence, or passage is encoded as a vector of numbers in an underlying semantic space. Neuroimaging studies using naturalistic linguistic stimuli have begun to investigate the extent to which such models reflect the representation of semantic information in the brain. The resulting patterns of BOLD activation are typically compared to the corresponding distributional semantics embeddings using an encoding paradigm, where a regression model is trained for each voxel to predict the activity using the semantic embeddings. Studies have found that distributed semantics models can predict BOLD activity across a range of brain regions, with embeddings extracted from state-of-the-art transformer models significantly outperforming older word embedding models. However, several methodological challenges hinder the interpretation of such findings. First, training a separate regression model for each voxel can lead to overfitting, and does not readily enable direct comparisons of the representational spaces of brain and model. Second, paradigms using naturalistic narrative stimuli have not always adequately controlled for confounds arising from the temporal autocorrelation of the BOLD signal. Third, methodological heterogeneity reduces comparability between studies, making it difficult to determine how robust these findings are to different types of language stimuli. In line with the philosophy of ‘scan once, analyse many’ (Madan, 2021), here we aim to mitigate some of these limitations by applying a consistent pipeline to reanalyse publicly available neuroimaging datasets. We reanalyse five distinct datasets with a total of 74 participants, covering a range of stimuli including written sentences (Pereira, et al., 2018, Anderson et al., 2017), written narratives (Wehbe, et al., 2014), and audio narratives (Y. Zhang et al., 2020, Bhattasali et al. 2020). To control for the autocorrelation of the BOLD signal, we segment narrative stimuli into individual sentences, which enables fitting of a general linear regression model to the BOLD data in the same way as studies using discrete sentences. To avoid the limitations of voxel-wise encoding models, we utilise representational similarity analysis (RSA) to compare the similarity structure of brain representations to the similarity structure of the distributed representations from computational models. Applying our uniform pipeline to all five datasets, we find small but robust correlations between brain and model RSA matrices of between 2-10 percent regardless of stimulus modality. We also find that transformer models typically outperform simpler average-word embeddings, though the magnitude of this effect is smaller than reported in several previous studies, potentially due to this effect being inflated by autocorrelation of the BOLD signal. Conversely, we find little systematic pattern regarding which transformer models performs best, raising questions about the robustness of such differences reported in previous studies. We conclude by highlighting the limitations of existing datasets for the purpose of model comparison, and suggesting how future work can more effectively evaluate what distributional semantics models can tell us about semantic representation in the brain.

Topic Areas: Computational Approaches, Syntax and Combinatorial Semantics

SNL Account Login

News

Abstract Submissions extended through June 10

Meeting Registration is Open

Make Your Hotel Reservations Now

2024 Membership is Open

Please see Dates & Deadlines for important upcoming dates.