Abstract: Anomaly detection is critical in domains such as cybersecurity and finance, especially when working with large-scale tabular data. Yet, unsupervised anomaly detection---where no labeled anomalies are available---remains challenging because traditional deep learning methods model a single global distribution, assuming all samples follow the same behavior. In contrast, real-world data often contain heterogeneous contexts (e.g., different users, accounts, or devices), where globally rare events may be normal within specific conditions. We introduce a \emph{contextual learning framework} that explicitly models how normal behavior varies across contexts by learning conditional data distributions $P(\mathbf{Y} \mid \mathbf{C})$ rather than a global joint distribution $P(\mathbf{X})$. The framework encompasses (1) a probabilistic formulation for context-conditioned learning, (2) a principled bilevel optimization strategy for automatically selecting informative context features using early validation loss, and (3) theoretical grounding through variance decomposition and discriminative learning principles. We instantiate this framework using a novel conditional Wasserstein autoencoder as a simple yet effective model for tabular anomaly detection. Extensive experiments across eight benchmark datasets demonstrate that contextual learning consistently outperforms global approaches---even when the optimal context is not intuitively obvious---establishing a new foundation for anomaly detection in heterogeneous tabular data.
Submission Type: Long submission (more than 12 pages of main content)
Changes Since Last Submission: We have carefully addressed all reviewer feedback/concerns and have made the following changes since the last submission.
1. Camera-ready formatting and metadata updates
- Switched the paper to the accepted TMLR format.
- Restored the full author list and cleaned up author footnotes.
- Filled in the camera-ready metadata fields, including month, year, and OpenReview link.
- Updated hyperlink formatting to use hidden links.
2. Stronger positioning in related work
- Revised the Related Work section to better situate the paper within conditional/contextual anomaly detection.
- Added or replaced references to more directly relevant prior work, including UCAD, conDENSE, and newer process-monitoring literature.
- Improved the discussion connecting this paper to SPC/MSPC and autoencoder-based process monitoring.
3. Method and theory clarifications
- Clarified the role of a single selected context feature and why the current paper focuses on one context variable for interpretability and stability.
- Refined the variance-reduction argument so it is presented as a heuristic proposition with more careful wording.
- Corrected notation in the CWAE loss description so the context/content variables are stated consistently.
- Revised the explanation of why WAE/CWAE is preferred over CVAE-style stochastic inference.
4. Clearer explanation of context selection
- Expanded the justification for using a small mostly-normal validation subset during bilevel context selection.
- Added references supporting model selection without labeled anomalies.
- Clarified the computational tradeoff behind using one epoch as the proxy for context ranking.
5. Bibliography cleanup
- Standardized several dataset citations in the bibliography.
- Replaced some weaker or less precise references with more appropriate ones.
- Removed a duplicate dataset entry and cleaned up a few bibliography formatting details.
Summary
Relative to the original submission, the camera-ready version keeps the same central method, experimental scope, and main conclusions, but improves the manuscript in four ways: it is now in final TMLR accepted format, it strengthens the literature review and paper positioning, it clarifies the theoretical and methodological presentation of contextual learning and CWAE, and it fixes a small number of citation and minor issues. No major change was made to the paper's core claims.
Assigned Action Editor: ~Markus_Lange-Hegermann1
Submission Number: 6544
Loading