Presentation

Search Abstracts | Symposia | Slide Sessions | Poster Sessions | Lightning Talks

From decoding elicited to self-generated inner speech

There is a Poster PDF for this presentation, but you must be a current member or registered to attend SNL 2023 to view it. Please go to your Account Home page to register.

Poster A113 in Poster Session A, Tuesday, October 24, 10:15 am - 12:00 pm CEST, Espace Vieux-Port

Oiwi Parker Jones1, Natalie Voets1; 1University of Oxford

Recent results show that inner speech can, in important contexts, be decoded to the same high-level of accuracy as articulated speech. This result however relies on neural data obtained while subjects perform elicited tasks, such as covert reading and repeating. By contrast, a practical neural speech prosthetic will require the decoding of inner speech that is self-generated. Prior work has emphasised the differences between these two types of inner speech, raising the question of how well a decoder optimised for one will generalise to the other. In this study, we trained phoneme-level decoders for consonants and for vowels on an atypically large elicited inner-speech dataset, previously acquired using 7T fMRI in a single subject. To this we now add a second self-generated inner speech dataset in the same subject. Details of the model architecture and training procedure are the same as in prior work. We note that the model is a simple fully-connected deep neural network that outperforms linear classifiers on held-out elicited data. The output classes for the self-generated inner speech task are the same as for the elicited tasks, either three consonants (/g, m, s/) or three vowels (/i, a, u/). The task was to imagine one of nine consonant-vowel syllables (/gi, ga, gu, si, mi, ma, mu, si, sa, su/). On each trial, the subject was prompted (1) to decide on which syllable to imagine, (2) to imagine saying it, and (3) to record the identity of the syllable using button presses. Across trials, the prompts were identical for steps (1) and (2), so it was entirely up to the subject to decide on and then imagine the syllables. After many hours of the elicited tasks, the subject was very familiar with the list of syllables to chose from. To exclude motor preparation signals, the buttons used in step (3) were randomised on each trial. For example, the subject might be prompted to press the numbered buttons “1=g, 2=m, 3=s” to record the syllable consonant for one trial and “1=s, 2=g, 3=m” for another trial. Although the decoders were trained exclusively on neural recordings obtained during elicited inner speech, they predict unseen phonemes accurately in both elicited and self-generated conditions. Accuracy was significantly better than chance both when decoding elicited and self-generated inner speech. The accuracy for decoding self-generated inner speech was also no worse, statistically, than for decoding elicited inner speech. Together these results demonstrate the viability of zero-shot task transfer for inner speech decoding. This result has practical significance for the development of a neural speech prosthetic, as labelled data is far easier to acquire at scale for controlled and elicited tasks than for self-generated inner speech. Indeed, elicited tasks may be the only option for acquiring labelled data in clinical populations who would benefit from a neural speech prosthetic (e.g. locked-in patients).

Topic Areas: Methods, Language Production

SNL Account Login

Forgot Password?
Create an Account

News