GeNeRTe: Generating Neural Representations from Text for Classification.Download PDF

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone
Abstract: Advancements in language modelling over the last decade have significantly improved downstream tasks such as automated text classification. However, deploying such systems requires high computational resources and extensive training data. Human adults can effortlessly perform such tasks with minimal computational overhead and training data which prompts research into leveraging neurocognitive signals such as Electroencephalography (EEG). We compare Large Language Models (LLMs) and EEG features captured during natural reading for text classification. Additionally, we introduce GeNeRTe, a novel state-of-the-art synthetic EEG generative model. Using only a limited amount of data, GeNeRTe learns to produce synthetic EEG features for a sentence through a neural regressor that resolves the relationship between embeddings for a sentence and its natural EEG. From our experiments, we show that GeNeRTe can effectively synthesize EEG features for unseen test sentences with just 236 sentence-EEG training pairs. Furthermore, using synthetic EEG features significantly improves text classification performance and reduces computation time. Our results emphasize the potential of synthetic EEG features, providing a viable path to create a new type of physiological embedding with lower computing requirements and improved model performance in practical applications.
Paper Type: long
Research Area: Efficient/Low-Resource Methods for NLP
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data resources, Data analysis
Languages Studied: English
Preprint Status: We are considering releasing a non-anonymous preprint in the next two months (i.e., during the reviewing process).
A1: yes
A1 Elaboration For Yes Or No: 8
A2: n/a
A2 Elaboration For Yes Or No: We are not aware of any potential risks related to our work.
A3: yes
B: yes
B1: yes
B2: n/a
B3: n/a
B4: n/a
B4 Elaboration For Yes Or No: We used a public dataset containing research with human subject. The original paper for that dataset anonymised the data so it does not contain any identification information.
B5: n/a
B6: yes
C: yes
C1: yes
C2: yes
C3: yes
C4: yes
D: no
D1: n/a
D1 Elaboration For Yes Or No: We used a public dataset containing research with human subject. The original paper for that dataset contains instructions given to the participant.
D2: n/a
D2 Elaboration For Yes Or No: We used a public dataset containing research with human subject. The original paper for that dataset contains recruitment information.
D3: n/a
D3 Elaboration For Yes Or No: We used a public dataset containing research with human subject. The original paper for that dataset contains information on how consent was obtained.
D4: n/a
D4 Elaboration For Yes Or No: We used a public dataset containing research with human subject. The original paper for that dataset contains information of data collection protocol.
D5: n/a
E: yes
E1: no
E1 Elaboration For Yes Or No: We used it for basic paraphrasing.
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview