Towards Clinically Faithful ECG Reports via Quantization-Based Tokenization

Rohan Banerjee; Jacques Delfrate; Robert Avram

Towards Clinically Faithful ECG Reports via Quantization-Based Tokenization

Rohan Banerjee, Jacques Delfrate, Robert Avram

Published: 23 Sept 2025, Last Modified: 24 Nov 2025NeurIPS 2025 Workshop BrainBodyFMEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Electrocardiography (ECG), Tokenization, Vector Quantization, Large Language Models (LLMs), Report Generation, Representation Learning

TL;DR: We introduce a way tokenize ECG signals using vector quantization and evaluate the representations on downstream tasks like cardiac condition classification and clinical report generation

Abstract: Effective tokenization is a critical barrier to bridging continuous electrocardiogram (ECG) signals and discrete language models for automated report generation. We introduce a novel ECG tokenizer based on an adaptive residual vector quantization framework, QINCo, that learns a high-fidelity, discrete representation of raw 12-lead signals. The clinical utility of these tokens is demonstrated through a downstream classification task, where their frozen embeddings achieve performance comparable to specialized supervised and self-supervised methods. Leveraging this tokenizer with an attention-based adapter, our approach to report generation outperforms byte-level tokenizer baselines and establishes a benchmark on a large-scale clinical dataset. Our work presents an effective and computationally efficient tokenization framework, enabling a more powerful integration of complex biosignals into generative models for clinical applications.

Submission Number: 60

Loading