Speech Synthesis from Brain Signals Based on Generative Model

Young-Eun Lee, Sang-Ho Kim, Seo-Hyun Lee, Jung-Sun Lee, Soowon Kim, Seong-Whan Lee

Published: 2023, Last Modified: 28 Jul 2025BCI 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Brain imaging studies of human speech are an active and intriguing research topic that is generating novel ways of communication through human brain signals. Efforts to generate voice from human neural activity have demonstrated the potential based on invasive measurements of speech, but have encountered difficulties in recreating data from imagined speech. Here, we propose NeuroTalk, which non-invasively converts brain signals from spoken and imagined speech to voice. The proposed framework is well-suited for decoding imagined speech, as it was trained on speech EEG data that was generalized to the domain of imagined speech. This means that the voice you hear when you imagine speaking is likely corresponding to the true voice of someone else, as the model has been specifically designed to adjust to this type of speech. Our findings suggest that speech synthesis of human EEG signals is a viable possibility, not just for spoken speech but also for imagined speech. This paper has extensively covered the contents of the paper, Lee et al. at AAAI 2023. Clearly, a high overlap with the above-mentioned contributions is inevitable and deliberate.