Abstract: Brain-to-speech systems offer a new means of human communication, enabling the generation of linguistic expression from neural activity. Recent studies on brain-to-speech using non-invasive brain signals have mostly proved the possibility of decoding words or sentences with repeated and predefined classes. In order to facilitate intuitive and natural communication via brain signals, decoding unconstrained speech in the same way as natural human communication is essential. In this study, we demonstrated the potential of speech synthesis for unseen sentences from biosignals by training the generative model to learn phonological features. We focused on constructing a sequence-conscious architecture that learns temporal dependencies and also leveraging the phoneme prediction loss term to extract speech-related features. Therefore, the feasibility of sentence-level neural communication based on non-predefined vocabularies was addressed, particularly training with non-repetitive sentences. Additionally, we conducted neurophysiological and spatio-spectral analysis by comparing brain activity during speech in terms of cortical and functional brain regions. Our results display the potential of unconstrained sentence generation, which may ultimately provide a new form of human interaction mediated by brain signals.
Loading