Forecasting Diabetic Complications from Brazilian Billing Codes with Time-Aware Attention

Forecasting Diabetic Complications from Brazilian Billing Codes with Time-Aware Attention

ICLR 2026 Conference Submission19502 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Health Insurance Claims, Deep Learning, Temporal Encoding, Clinical Decision Support, Diabetic Complications

TL;DR: Claims-only sequences + sinusoidal time, in a BiLSTM-attention model, forecast severe diabetic complications 6–12 months ahead, transfer across health insurers, and are validated in the field.

Abstract: Predicting severe diabetic complications from longitudinal patient traces can enable proactive care. Where multi-institutional EHR integration is impractical, standardized health-insurance claims offer broad, longitudinal coverage despite clinical sparsity. We study this setting in Brazil’s TUSS billing-code ecosystem and present a claims-only framework for forecasting complications (angiopathies, amputations, renal failure) 6–12 months ahead. TUSS codes are represented with skip-gram embeddings, and absolute timing is injected via fixed sinusoidal time embeddings added directly to event vectors; a BiLSTM with self-attention summarizes long, irregular histories. On anonymized data from ~3.9 million individuals, the model achieves an AUC of 0.907 and an Average Precision of 0.631, outperforming capacity-matched baselines. Ablations show that temporal encoding and attention are complementary, with large gains only when combined. We further observe robust transfer to a second operator and concordant blinded field validations that surfaced previously unrecognized high-risk patients. While our contribution is a methodological instantiation rather than an architectural novelty, the work offers a careful case study of claims-only prediction at a national scale, design lessons for modeling sparse transactional health data, and practical evidence for its utility in real-world risk stratification.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 19502

Loading