CerebroVoice: A Stereotactic EEG Dataset and Benchmark for Bilingual Brain-to-Speech Synthesis and Activity Detection
Brain signal to speech synthesis offers a new way of speech communication, enabling innovative services and applications. With high temporal and spatial resolution, invasive brain sensing such as stereotactic electroencephalography (sEEG) becomes one of the promising solutions to decode complex brain dynamics. However, such data are hard to come by. In this paper, we introduce a bilingual brain-to-speech synthesis (CerebroVoice) dataset: the first publicly accessible sEEG recordings curated for bilingual brain-to-speech synthesis. Specifically, the CerebroVoice dataset comprises sEEG signals recorded while the speakers are reading Chinese Mandarin words, English words, and Chinese Mandarin digits. We establish benchmarks for two tasks on the CerebroVoice dataset: speech synthesis and voice activity detection (VAD). For the speech synthesis task, the objective is to reconstruct the speech uttered by the participants based on their sEEG recordings. We propose a novel framework, Mixture of Bilingual Synergy Experts (MoBSE), which uses a language-aware dynamic organization of low-rank expert weights to enhance the efficiency of language-specific decoding tasks. The proposed MoBSE framework achieves significant performance improvements over current state-of-the-art methods, producing more natural and intelligible reconstructed speech. The VAD task aims to determine whether the speaker is actively speaking. In this benchmark, we adopt three established architectures and provide comprehensive evaluation metrics to assess their performance. Our findings indicate that low-frequency signals consistently outperform high-gamma activity across all metrics, suggesting that low-frequency filtering is more effective for VAD tasks. This finding provides valuable insights for advancing brain-computer interfaces in clinical applications. The CerebroVoice dataset and benchmarks are publicly available on Zenodo and GitHub for research purposes.