Structure-Aware Set Transformers: Temporal and Variable-type Attention Biases for Asynchronouse Clinical Time Series

Published: 01 Mar 2026, Last Modified: 01 Mar 2026ICLR 2026 TSALM Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Electronic Health Records, Transformer, Bias, Set embedding
TL;DR: We propose the STAR Set Transformer, which imposes both variable-type and temporal biases to re-introduce a grid-like structure while preserving set flexibility
Abstract: Electronic health records (EHR) are irregular, asynchronous multivariate time series. As time-series foundation models increasingly tokenize events rather than discretizing time, the input layout becomes a key design choice. Grids expose time×variable structure but require imputation or missingness masks, risking error or sampling-policy shortcuts. Point-set tokenization avoids discretization but loses within-variable trajectories and time-local cross-variable context (Fig. 1). We restore these priors in STructure-AwaRe (STAR) Set Transformer by adding parameter-efficient soft attention biases: a temporal locality penalty −|∆t|/τ with learnable timescales and a variable-type affinity Bsi,sj from a learned feature-compatibility matrix. We benchmark 10 depth-wise fusion schedules (Fig. 2). On three ICU prediction tasks, STAR-Set achieves AUC/APR of 0.7158/0.0026 (CPR), 0.9164/0.2033 (mortality), and 0.8373/0.1258 (vasopressor use), outperforming regular-grid, event-time grid, and prior set baselines. Learned τ and B provide interpretable summaries of temporal context and variable interactions, offering a practical plug-in for context-informed time-series models.
Track: Research Track (max 4 pages)
Submission Number: 117
Loading