Search Abstracts | Symposia | Slide Sessions | Poster Sessions
Functional specificity in multimodal large language models and the human brain
Poster D58 in Poster Session D, Saturday, October 26, 10:30 am - 12:00 pm, Great Hall 4
Chengcheng Wang1, Zhiyu Fan2, Zaizhu Han2, Yanchao Bi2, Jixing Li1; 1City University of Hong Kong, 2Beijing Normal University
Introduction. Different types of aphasia, such as Broca’s and Wernicke’s, exhibit specific impairments in language production or comprehension, suggesting that the human language system is composed of distinct subsystems. However, it remains unclear whether similar functional specificity exists in large language models (LLMs), which have demonstrated the ability to simulate various aspects of human language behavior. In this study, we utilized a multimodal LLM to simulate distinct types of aphasic behaviors derived from a picture description task. By selectively disabling layers and attention heads within the model, we investigated whether the lesioned model exhibits behavioral and brain patterns analogous to those observed in different types of aphasia. Methods. We utilized behavioral data and brain lesion maps from different types of aphasia collected at the China Rehabilitation Research Center and Beijing Normal University (Bi et al., 2015; Han et al., 2013). The dataset comprises 88 Chinese aphasic patients (29 females, mean age = 45.7 ± 13.2 years), who suffered from strokes or traumatic brain injuries. Their types of aphasia were diagnosed using a series of behavioral tasks, including the well-known “Cookie Theft” picture-description task from the Boston Diagnostic Aphasia Examination (Goodglass & Kaplan, 1983). There are 6 different aphasia types in total, including motor, sensory, conduction, anomia, subcortical and global aphasia. The dataset also consists of 43 healthy controls (21 females, mean age = 49.3 ± 10.7 years). We employed the Visual-Chinese-LLaMa-Alpaca (VisualCLA; Yang et al., 2023), a multimodal LLM, to extract sentence-level embeddings from the transcribed speech of patients during the picture-description task. We then trained two feedforward neural networks (FFNNs) to classify these embeddings according to the corresponding aphasia types and brain lesion maps. To explore functional specificity within the multimodal LLM, we systematically disabled different numbers of layers and attention heads of the model, and prompted the lesioned models to describe the “Cookie Theft” picture. We evaluated the impact of disabling layers and attention heads on the model’s output by comparing it to the speech patterns of aphasic patients and normal controls using the BLEU score (Papineni et al., 2002). Additionally, we applied the previously trained FFNNs to predict both the aphasia types and the lesioned brain maps based on outputs from the lesioned models. Results. We demonstrated that layers and attention heads in the multimodal LLM serve distinct functions in the language production task. Specifically, both the BLEU scores and classification results based on outputs from lesioned models suggest that models with a greater number of disabled layers exhibit behavioral patterns similar to multiple aphasia types, including motor, sensory, conduction, and anomia aphasia. In contrast, models with more disabled attention heads only displayed patterns resembling motor aphasia. These results highlight functional specificity within LLMs and provide insights into the underlying mechanisms differentiating various types of aphasia.
Topic Areas: Computational Approaches, Disorders: Acquired