CATS: Cross-Modal Autoencoding for Time Series Summarization

CATS: Cross-Modal Autoencoding for Time Series Summarization

TMLR Paper5433 Authors

21 Jul 2025 (modified: 06 Oct 2025)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Time‑series captioning is highly relevant in the industrial monitoring tasks: summarization of characteristic patterns and trends in time series can facilitate data analytics and enable flexible user experience. Yet, due to the scarcity of labeled data, existing data-driven methods have not seen definitive successes so far, while approaches relying on LLMs are impractical in real‑world settings due to privacy, cybersecurity, and computational constraints, not to mention their big carbon footprint. In this work we ask whether a small model trained on a small dataset can produce accurate, relevant, and readable time series summaries. We propose a lightweight encoder‑decoder architecture trained with a novel cross-modal autoencoding method and demonstrate that, despite its size, the model achieves performance comparable to the state‑of‑the‑art GPT‑4o and outperforms existing open‑source baselines. Our results suggest that effective time series captioning is feasible under realistic industrial requirements.

Submission Length: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Taylor_W._Killian1

Submission Number: 5433

Loading