Presentation
Search Abstracts | Symposia | Slide Sessions | Poster Sessions | Lightning Talks
Multimodal Conceptual Representation: Do ChatGPT/LLMs require embodiment to reach human-level representation?
There is a Poster PDF for this presentation, but you must be a current member or registered to attend SNL 2023 to view it. Please go to your Account Home page to register.
Poster A96 in Poster Session A, Tuesday, October 24, 10:15 am - 12:00 pm CEST, Espace Vieux-Port
Qihui Xu1, Yingying Peng2, Minghua Wu2, Feng Xiao2, Martin Chodorow3,4, Ping Li2; 1BCBL: Basque Center on Cognition, Brain and Language, 2The Hong Kong Polytechnic University, 3Hunter College, City University of New York, 4Graduate Center, City University of New York
To what extent does conceptual representation require sensorimotor grounding? Previous neuroimaging studies observed shared representation on object-color processing between congenitally blind and sighted subjects (Wang et al., 2020; Striem-Amit et al., 2018), according to which the knowledge of colors of blind subjects are represented in the dorsal anterior temporal lobe (dATL, a critical region for language and conceptual/abstract knowledge) despite no color representation in their visual cortex. Recent advances in large language models (LLM) have the potential to provide additional insights into this issue. Despite learning from limited modalities (e.g., text for GPT-3.5, and text+image for GPT-4), LLMs have nevertheless demonstrated human-like behaviors in various psychology tasks (Binz & Schulz, 2023), which may provide an alternative interpretation of the acquisition of conceptual knowledge. We analyzed and compared ratings of ~5,000 words across multiple lexical conceptual dimensions between humans and ChatGPT (versions based on GPT-3.5 and GPT-4). Based on categories explored in psycholinguistic norms, the dimensions we used include (1) emotional aspects (i.e., emotional valence and dominance) (2) salience (arousal, conceptual size, and gender), (3) mental visualization (concreteness and imageability), (5) sensory domains (haptic, auditory, olfactory, interoceptive, visual, and gustatory experiences), and (6) motor domains (actions involving foot/leg, hand/arm, head excluding mouth, mouth/throat, and torso). The dimensions in question provide comprehensive coverage of lexical conceptual processing, typically explored in prior research. They exhibit nuanced variability concerning social-emotional aspects, abstract mental imagery, and direct bodily experiences. The results show that both GPT-3.5 and GPT-4 were strongly correlated with humans in several abstract dimensions, such as emotional valence (rs = 0.90 for both GPT-3.5 and GPT-4) and conceptual size (rs = 0.64 for GPT-3.5 and rs = 0.70 for GPT-4). In dimensions related to sensory and motor domains, GPT-3.5 shows weaker correlations while GPT-4 has made significant progress compared to GPT-3.5 (e.g., rs = 0.69 for GPT-4 as compared with rs = 0.27 for GPT-3.5 in the visual dimension; rs = 0.63 for GPT-4 as compared with rs = 0.33 for GPT-3.5 in the interoceptive dimension). Still, GPT-4 struggles to fully capture motor aspects of conceptual knowledge such as actions with mouth/throat (rs = 0.46), and torso (rs = 0.39). We observed highly similar patterns between aggregated and individual subjects’ data. Moreover, we found that dimensions that had stronger association with the visual dimension exhibited greater improvement from GPT-3.5 to GPT-4 (rs = 0.74), suggesting that GPT-4’s progress can largely be associated with the newly added visual inputs in its training. Certain aspects of conceptual representation appear to exhibit a degree of independence from sensory experiences, but others seem to necessitate them. Our results are in line with Wang et al’s (2020) dual-coding of knowledge theory. Additionally, we provide insights into multiple dimensions of conceptual representation, potential knowledge transfer between different dimensions, and a wider scope of concepts beyond a narrow set of color words. We highlight the complexities of knowledge representation and the potential influence of embodied experience in shaping language and cognition.
Topic Areas: Computational Approaches, Multisensory or Sensorimotor Integration