Track: long paper (up to 8 pages)
Keywords: Multimodal Learning, Clinical EEG Language Model, EEG, Clinical Report Generation
TL;DR: We introduce CELM, the first EEG-to-language foundation model for end-to-end clinical report generation from long-duration EEGs.
Abstract: Generating clinical reports that summarize abnormal patterns, diagnostic findings, and clinical interpretations from long-term EEG recordings remains labor-intensive.
We curate a large-scale clinical EEG dataset with $9{,}922$ reports paired with approximately $11{,}000$ hours of EEG recordings from $9{,}048$ patients.
We therefore develop CELM, the first clinical EEG-to-Language foundation model capable of summarizing long-duration, variable-length EEG recordings and performing end-to-end clinical report generation at multiple scales, including recording description, background activity, epileptiform abnormalities, events/seizures, and impressions.
Experimental results show that, with patient history supervision, our method achieves $70\%$–$95\%$ average relative improvements in standard generation metrics (e.g., ROUGE-1 and METEOR) from $0.2$–$0.3$ to $0.4$–$0.6$.
In the zero-shot setting without patient history, CELM attains generation scores in the range of $0.43$–$0.52$, compared to baselines of $0.17$–$0.26$.
CELM integrates pretrained EEG foundation models with language models to enable scalable multimodal learning. We release our model and benchmark construction pipeline at https://github.com/Jathurshan0330/CELM.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 62
Loading