TL;DR: We propose DiSPaT, a self-play framework that iteratively refines LLMs for tabular anomaly detection by minimizing $f$-divergence between real normal data and model-generated samples through alternating critic-policy optimization.
Abstract: Anomaly detection in tabular data poses significant challenges due to heterogeneous feature types—mixing numerical, categorical, and textual attributes, which complicate learning meaningful representations of normality. Recent work has applied large language models (LLMs) to this problem by serializing table rows as text sequences, yet these approaches rely on one-shot supervised fine-tuning that offers limited signal to tighten the model's description of normality. We propose DiSPaT, a self-play fine-tuning framework that strengthens the model's understanding of normal data. Building on the theoretical foundation of $f$-divergence minimization, we derive a tight approximation connecting our training objective to reducing the distributional gap between real normal data and model-generated samples. DiSPaT operates through an alternating optimization: at each iteration, the current policy generates synthetic samples that serve as pseudo-anomalies, while a critic discriminator learns to distinguish these from real normal samples; this signal drives policy updates that progressively align the model distribution with the true normal-data distribution. Extensive experiments on diverse benchmarks demonstrate that DiSPaT consistently outperforms prior LLM-based methods, deep learning approaches, and classical unsupervised detectors for tabular anomaly detection.
Lay Summary: Many real-world anomaly detection problems involve tables that mix numbers, categories, and free text, making it difficult for standard methods to learn what normal data looks like. We propose DiSPaT, a method that fine-tunes a large language model by repeatedly comparing real normal examples with synthetic “not-normal” examples generated by the model itself. This self-play process provides a stronger learning signal than one-shot fine-tuning on normal data alone, allowing the model to learn a sharper description of normal behavior without requiring labeled anomalies. We also provide a theoretical analysis showing that the method reduces the gap between the model’s distribution and the true distribution of normal data. Across a wide range of tabular anomaly detection benchmarks, DiSPaT consistently outperforms prior LLM-based, deep learning, and classical baselines.
Originally Submitted Supplementary Material: zip
Primary Area: Deep Learning->Large Language Models
Keywords: Anomaly Detection, Tabular Data, Large Language Models, Self-Play
Originally Submitted PDF: pdf
Submission Number: 16051
Loading