Towards Clinically Faithful ECG Reports via Quantization-Based Tokenization
Keywords: Electrocardiography (ECG), Tokenization, Vector Quantization, Large Language Models (LLMs), Report Generation, Representation Learning
TL;DR: We introduce a way tokenize ECG signals using vector quantization and evaluate the representations on downstream tasks like cardiac condition classification and clinical report generation
Abstract: Effective tokenization is a critical barrier to bridging continuous electrocardiogram (ECG) signals and discrete language models for automated report generation. We introduce a novel ECG tokenizer based on an adaptive residual vector quantization framework, QINCo, that learns a high-fidelity, discrete representation of raw 12-lead signals. The clinical utility of these tokens is demonstrated through a downstream classification task, where their frozen embeddings achieve performance comparable to specialized supervised and self-supervised methods. Leveraging this tokenizer with an attention-based adapter, our approach to report generation outperforms byte-level tokenizer baselines and establishes a benchmark on a large-scale clinical dataset. Our work presents an effective and computationally efficient tokenization framework, enabling a more powerful integration of complex biosignals into generative models for clinical applications.
Submission Number: 60
Loading