STT-LLM: Structural-Temporal Tokenization for Adapting LLMs to Longitudinal Profiles

ICLR 2026 Conference Submission21146 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Structural-Temporal Embedding, Longitudinal Biomedical Profiles, LLM Tokenization, Sports Doping
Abstract: Large Language Models have shown strong generalization across natural language tasks but remain underexplored for longitudinal biomedical profiles. In sports, biological profiles are analyzed for doping, with particular emphasis on two key challenges for longitudinal data: (i) sequence prediction for early detection of prohibited substance use, and (ii) anomaly detection for identifying doping-related deviations. We propose STT-LLM, a structural-temporal tokenization framework that adapts LLMs to longitudinal analysis without modifying the backbone architecture. STT-LLM constructs joint embeddings that capture both temporal dynamics and biological pathway-based interactions, which are then transformed into LLM-compatible tokens through the specialized structural and temporal tokenizers. We evaluate our approach on real-world longitudinal steroid datasets from athletes, where STT-LLM consistently outperforms LLM baselines. In addition, we present a case study where STT-LLM provides contextual reasoning that aligns more closely with expert assessments compared to baseline models. These results highlight the effectiveness of embedding-guided tokenization for adapting LLMs to understand longitudinal biological data.
Supplementary Material: zip
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 21146
Loading