Abstract: In recent years, with the recent remarkable development of large-scale language models in natural language processing research, there has been an increasing number of studies employing large-scale language models to investigate the information processing mechanisms of encoding and decoding in the brain. In this study, we developed a new pre-trained language model, BrainLM, which incorporates paired data of brain activity induced by text and stimuli, and verified the accuracy of estimating brain states from natural language in multiple NLP tasks. In essence, our research has achieved several noteworthy accomplishments. Firstly, we successfully developed a multimodal model that incorporates both brain and text. Subsequently, we conducted bi-directional experiments to validate the model and ensure the reliability of both brain encoding and decoding processes. Furthermore, we performed meticulous comparative experiments, wherein we introduced 20 state-of-the-art (SOTA) language models as a control group. Our findings reveal that our proposed model outperforms superior brain encoding ability compared to the control group. Lastly, we designed a discrete Autoencoder module that extracts brain features. This module can be utilized independently to extract brain features in a wider range of brain decoding studies beyond fMRI.
Loading