Keywords: Efficient Transformers, Long-Context Modeling, Memory-Efficient Inference, Foundation Models, Representation Learning, Clinical Time Series, Electronic Health Records, Feature-wise Modulation
TL;DR: Cached foundation model summaries improve memory-constrained Transformer inference for long clinical time series, with clear diminishing returns as recent context grows. Summaries of recent history outperform distant history for acute prediction.
Abstract: Transformer-based models for clinical time series face a deployment bottleneck: patient histories can span thousands of irregularly spaced events, yet inference hardware imposes strict memory budgets. We study a simple decoupling strategy in which a pretrained foundation model compresses a patient's historical events into a fixed-size cached summary offline, and a lightweight prediction model processes only a short window of recent events conditioned on that summary at inference time. Through 252 experiments on MIMIC-IV we characterize when this strategy is worthwhile. The central finding is a clear pattern of diminishing returns: cached summaries yield a 6.5% relative AUROC gain when the recent window is limited to 8 events (p < 0.001), but the benefit shrinks to a statistically insignificant 0.1% once the window reaches 256 events. We further show that modulating event representations with the summary (FiLM) outperforms treating it as an additional input token (p < 0.001), and that summaries of recent history are more informative than those of distant history (p < 0.01). Together, these results provide actionable guidance for allocating context budgets when deploying sequence models on long, irregular time series under memory constraints.
Track: Research Track (max 4 pages)
Submission Number: 80
Loading