InsightMiner: Automated Insight Generation through MLLM-enabled Exploratory Data Analysis

ACL ARR 2025 February Submission7426 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract:

Data exploration is a crucial step in the data analysis pipeline, enabling people to uncover patterns, trends, and anomalies that help with their decision-making. However, traditional methods often demand substantial technical expertise, including proficiency in programming languages, data visualization tools, and statistical software, which can be a barrier for novices. To address these challenges, we introduce $ \textbf{InsightMiner}$, a novel system that leverages Multi-modal Large Language Models (MLLMs) to automate and simplify data exploration and visualization, accordingly improving their ability to discover meaningful insights. InsightMiner allows users to upload datasets and propose queries in natural language, employing advanced prompt engineering techniques to interpret user intent such as trend analysis and comparisons, and extract entities including variables, time periods, or categories. The system dynamically generates relevant visualizations, including time-series graphs, bar charts, or heatmaps, to effectively communicate the extracted insights. Moreover, InsightMiner supports an iterative exploration process, allowing users to refine their queries and explore different dimensions of complex dataset in an intuitive and efficient manner. Through case studies in the field of urban safety and transportation, we demonstrate InsightMiner’s ability to generate actionable insights and streamline the data exploration process. By combining the power of MLLMs with user-centric design, InsightMiner provides access to advanced data exploration, making it a versatile tool for both novice and expert users.

Paper Type: Long
Research Area: Dialogue and Interactive Systems
Research Area Keywords: human-in-the-loop, multi-modal dialogue systems, interactive storytelling, applications
Contribution Types: Publicly available software and/or pre-trained models, Data analysis
Languages Studied: English
Submission Number: 7426
Loading