Search Abstracts | Symposia | Slide Sessions | Poster Sessions
A Novel Framework for Decoding Continuous Language from Brain Activities Recorded by fMRI
Poster B57 in Poster Session B, Friday, October 25, 10:00 - 11:30 am, Great Hall 4
Jingyuan Sun1, Xinpei Zhao2, Shaonan Wang3; 1KU Leuven, 2Insititute of Automation, Chinese Academy of Sciences
Decoding continuous language text from brain activity is a groundbreaking endeavor at the intersection of neuroscience, linguistics, and artificial intelligence. This advancement promises to revolutionize communication, particularly for individuals with speech impairments, and offers profound insights into the brain’s language processing. The development of interfaces that seamlessly integrate thought and speech holds immense potential. While invasive technologies like electrocorticography (ECoG) have shown promise, their broad application is limited due to the scarcity of invasive data and the complexities associated with neurosurgery. Non-invasive brain recordings, such as those obtained from functional magnetic resonance imaging (fMRI), present a more accessible alternative. However, decoding continuous language from these non-invasive recordings remains a significant challenge. This difficulty arises from the intricate and dynamic relationship between language and the neural responses it elicits, compounded by the inherently noisy nature of non-invasive neuroimaging. Previous attempts to address this issue involved a two-step process where brain activity was first encoded from text using a linear model, which then guided text generation by aligning it with predicted brain responses. Although this method showed some improvement over random-level performance, the advancements were marginal, and the effectiveness of such an indirect method using a linear model for continuous text generation was questionable. In response to this challenge, we introduce MapGuide, a novel two-stage framework designed to decode continuous language from brain activities more effectively. In the first stage, MapGuide employs a Transformer-based mapper to map brain activity to text embeddings. To enhance the mapper’s resilience to neural noise, we utilize a random mask method for data augmentation and contrastive learning. In the second stage, a pre-trained text generator uses the predicted text embeddings to produce text that closely aligns with these embeddings. This integration of mapping and text generation stages offers a more direct and efficient solution for translating neural signals into coherent text. Experimental results demonstrate that MapGuide achieves a new state-of-the-art in reconstructing continuous language from fMRI-based brain recordings. Our method significantly outperforms previous attempts across four different types of metrics. Additionally, our investigation reveals a crucial contrast in compatibility patterns between frameworks. While previous encoding-based frameworks performed well with linear models, our decoding-based framework shows superior performance with non-linear models. This finding marks a pivotal shift in approach, emphasizing the importance of using non-linear models for optimal results in this context. Furthermore, we identify a clear link between the accuracy of mapping brain activities to text embeddings and improved text reconstruction performance. This insight highlights the importance of refining the brain-to-text embedding mapping process, thereby simplifying the task of reconstructing language from brain activities. Our study underscores the potential of the MapGuide framework to advance the field of brain-computer interfaces and enhance our understanding of the neural basis of language.
Topic Areas: Computational Approaches, Methods