Presentation
Search Abstracts | Symposia | Slide Sessions | Poster Sessions | Lightning Talks
Exploring the representational space of abstract concepts
Poster E27 in Poster Session E, Thursday, October 26, 10:15 am - 12:00 pm CEST, Espace Vieux-Port
Also presenting in Lightning Talks E, Thursday, October 26, 10:00 - 10:15 am CEST, Auditorium
Andrew Persichetti1, Jiayu Shao1, Charles Zheng2, Lukas Muttenthaler3, Juan Antonio Lossio-Ventura2, Stephen Gotts1, Francisco Pereira2, Alex Martin1; 1Laboratory of Brain and Cognition, National Institutes of Health, Bethesda, MD, USA, 2Machine Learning Team, National Institutes of Health, Bethesda, MD, USA, 3Machine Learning Group, Technische Universität Berlin, Berlin, Germany
Concepts are traditionally treated as a dichotomy between “concrete” concepts – such as dogs and cars— and “abstract” concepts that lack perceivable features – such as love and despair. A central goal of cognitive science is to understand the representational space of all concepts. However, most research has focused on concrete concepts and the relatively few attempts to examine abstract concepts often use methods that are not well-suited to studying them (e.g., asking people to list features of concepts or rate them on experimenter-defined dimensions). We sought to uncover the core dimensions that underlie the representational space of 378 abstract words from the Abstract Conceptual Feature database using an odd-one-out similarity task, in which participants chose which of three abstract words was least like the other two across many trials, in combination with an approximate Bayesian method for embedding the concepts in a vector space, called VICE (Variational Interpretable Concept Embeddings). A total of 6,248 participants (3,221 female; mean age=43.1, s.d.=12.3) completed the odd-one-out triad task on Amazon’s Mechanical Turk. During the experiment, each participant completed twenty triads and four catch trials. Data were discarded if the participant missed a catch trial and the triads were reposted until completed. Data were used from 4,637 participants, each of whom could complete the experiment up to five times. We collected 396,368 trials for the main experiment and an additional test set of 20,000 triads, in which twenty participants each completed 1,000 random triads. We then used VICE to embed the concepts in a vector space. VICE does this by obtaining sparse, non-negative representations from the data with uncertainty estimates for the embedding values. These estimates are then used to automatically select the dimensions that best explain the data. We randomly split the data into a 90% training partition and 10% validation partition and ran VICE twenty times starting from random seeds for a range of hyperparameter settings. The best hyperparameter set was chosen based on the average cross-entropy on the validation set, averaged across the twenty seeds. For that chosen hyperparameter set, we estimated the model performance on the test set to obtain an upper bound on the optimal model performance in the main experiment. First, we found that the upper bound of the best achievable accuracy is 69% and the VICE model obtained 61.5 – 62.5% accuracies across twenty seeds. Thus, the VICE model achieved ~90% of the best possible accuracy at predicting human behavior. Second, we identified fourteen reproducible dimensions using single-linkage agglomerative clustering on the union of all dimensions from each of the twenty seeds, merging similar dimensions (cosine distance < 0.04) iteratively, and keeping only the dimensions that were reproduced in at least fifteen out of twenty seeds. These results give us an interpretable multi-dimensional representational space of abstract concepts, in which each word has a distributional weighting across the dimensions. In addition, the dimensional weightings can be used in a variety of applications, including clustering abstract concepts into categories and designing fMRI experiments.
Topic Areas: Meaning: Lexical Semantics, Meaning: Discourse and Pragmatics