Track: Tiny Paper Track
Keywords: Tokenization, Vector Quantization, Electrocardiograms, Linear Probing
TL;DR: We propose a discrete tokenization method for raw continuous ECG signals
Abstract: Electrocardiography (ECG) is a fundamental tool in cardiology, yet manual interpretation remains labor-intensive and prone to variability. To enable automated clinical report generation using Large Language Models (LLMs), ECG signals must first be converted into discrete tokenized representations. In this work, we explore Vector Quantization (VQ) for ECG tokenization, specifically QINCo, an adaptive residual quantization method that refines its codebooks dynamically to capture intricate ECG patterns. We train our tokenizer on the MIMIC-IV dataset and evaluate its effectiveness through a linear probing (LP) task, classifying six major cardiac conditions. Our approach achieves a test set MSE of 0.028, 96.28% codebook utilization with a linear probing performance across all diagnostic categories, with a micro AUC values were 0.957, 0.893, 0.888, 0.705, 0.833, and 0.929, respectively, and corresponding micro F1 scores of 0.911, 0.789, 0.791, 0.688, 0.719, and 0.846. These results demonstrate the clinical significance of our learned representations. To the best of our knowledge, this is the first work to apply deep learning-based VQ for direct ECG tokenization. Future work will expand pretraining to larger datasets and integrate these tokens into LLM-driven clinical report generation, with multi-site validation planned across North America.
Attendance: Rohan Banerjee
Submission Number: 82
Loading