Insight Miner: A Time Series Analysis Dataset for Cross-Domain Alignment with Natural Language

Published: 28 Oct 2023, Last Modified: 10 Dec 2023NeurIPS2023-AI4Science PosterEveryoneRevisionsBibTeX
Keywords: time series analysis, large language models, large multimodal models
TL;DR: We design a time series analysis dataset that enables the alignment of time series and language domain.
Abstract: Time-series data is essential in various science and industry domains, like environmental analysis, agriculture, transportation, and finance. Researchers need to use their domain knowledge to conduct insight mining from time-series data to study scientific topics. However, this process is time-consuming and highly depends on expert knowledge. This paper proposes a large-scale multimodal model (LMM), Insight Miner, to generate decent and comprehensive time-series descriptions with domain-specific knowledge. To introduce rich time-series insights to Insight Miner, we propose a time-series analysis dataset, TS-Insights, composed of time series and textual insight pairs. In the TS-Insights dataset, we include 100k time series windows sampled from 20 forecasting datasets spanning a wide variety of domains and granularities. Through a meticulous combination of heuristics and statistical tools, we preprocess each raw time series window and use GPT-4 to generate a coherent trend description based on the extracted features. After training with the TS-Insights dataset via instruct tuning, the Insight Miner model performs better in generating time series descriptions and insights compared with state-of-the-art multimodality models, such as LLaVA and GPT-4. Our findings suggest a promising direction of leveraging LMMs for time series analysis and potentially offering avenues for efficient insight mining in scientific domains. The TS-Insights dataset is available here: https://drive.google.com/drive/folders/1qGXigxE5GvmF1oLuGXaqLMkRgwoQfZ7V?usp=sharing.
Submission Track: Original Research
Submission Number: 185
Loading