Presentation
Search Abstracts | Symposia | Slide Sessions | Poster Sessions | Lightning Talks
A Comprehensive Analysis of the Neural Fits of Sentence Embedding Model Classes
There is a Poster PDF for this presentation, but you must be a current member or registered to attend SNL 2023 to view it. Please go to your Account Home page to register.
Poster B97 in Poster Session B, Tuesday, October 24, 3:30 - 5:15 pm CEST, Espace Vieux-Port
Helena Balabin1, Antonietta Gabriella Liuzzi1, Jingyuan Sun2, Patrick Dupont1, Sien Moens2, Rik Vandenberghe1; 1Laboratory for Cognitive Neurology, KU Leuven, 2Language Intelligence & Information Retrieval Lab, KU Leuven
In the past few years, neural fits based on associations between brain activity patterns and pre-trained language models have been increasingly used to validate hypotheses about language processing. However, there remain unanswered questions about what intrinsic properties of language processing these neural fits reflect. Here, we examine to which degree the neural fits differ across brain networks, language models and neural fit approaches, namely neural encoding and Representational Similarity Analysis (RSA). We employ parallel sentence and functional magnetic resonance imaging (fMRI) data from Pereira et al. (2018) that comprise short paragraphs about 96 different concepts. Based on four model classes representing linguistic hypotheses about sentence processing, we perform a comprehensive analysis of their fits to four different brain networks. Specifically, we focus on the language, task-positive and vision networks as well as the default mode network (DMN) that are predefined in the open access dataset. We apply a total of 12 sentence embedding models belonging to four different classes: masked language modeling, pragmatic coherence, semantic comparison and contrastive learning. Next, we calculate neural fits for each brain network and sentence embedding model combination using neural encoding and RSA. We implement neural encoding by adding a linear mapping model on top of the output of the sentence embedding model to predict the fMRI features. Then, we evaluate the prediction performance using pairwise accuracy, a metric based on comparisons of distances within pairs of predicted and ground truth fMRI features. For RSA, we use the Spearman's rank correlation between the representational dissimilarity matrices (RDMs) of a given sentence embedding model and brain network. Overall, GPT-2, SkipThoughts, and S-RoBERTa yielded the strongest correlations, in particular with the language network: r=0.067 (p<0.001), r=0.082 (p<0.001), and r=0.051 (p<0.001). For neural encoding, GPT-3, S-T5 and SkipThoughts resulted in the highest pairwise accuracy scores. Moreover, contrastive learning-based models resulted in overall low neural fits. Furthermore, our findings demonstrate that neural fits vary across models that represent the same linguistic hypothesis but are based on different model sizes and training data (e.g., GPT-2 and GPT-3) and neural fit approaches (RSA versus neural encoding). Notably, we show that the embedding size (i.e., the dimensionality of a sentence embedding) and model performances are correlated to each other in the context of neural encoding. These findings indicate that the high neural fit of large language models such as GPT-3 based on neural encoding is substantially influenced by its embedding size (alongside other possible factors such as model architecture and training data) rather than the inherent properties of its representational space, as reflected by its low neural fit based on RSA. In conclusion, the embedding model class does not significantly impact the resulting neural fit, as models from different classes such as GPT-2 and S-RoBERTa demonstrate comparable performance, while the embedding size proves to be one of the determining factors.
Topic Areas: Computational Approaches,