Delving into Large Language Models for Effective Time-Series Anomaly Detection

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: large language model, time series, anomaly detection
Abstract: Recent efforts to apply Large Language Models (LLMs) to time-series anomaly detection (TSAD) have yielded limited success, often performing worse than even simple methods. While prior work has focused solely on downstream performance evaluation, the fundamental question—why do LLMs struggle with TSAD?—has remained largely unexplored. In this paper, we present an in-depth analysis that identifies two core challenges in understanding complex temporal dynamics and accurately localizing anomalies. To address these challenges, we propose a simple yet effective method that combines statistical decomposition with index-aware prompting. Our method outperforms 21 existing prompting strategies on the AnomLLM benchmark, achieving up to a 66.6\% improvement in F1 score. We further compare LLMs with 16 non-LLM baselines on the TSB-AD benchmark, highlighting scenarios where LLMs offer unique advantages via contextual reasoning. Our findings provide empirical insights into how and when LLMs can be effective for TSAD. The code is publicly available at: https://github.com/junwoopark92/LLM-TSAD
Supplementary Material: zip
Primary Area: Applications (e.g., vision, language, speech and audio, Creative AI)
Submission Number: 18699
Loading