Abstract: Scientific research often requires constructing
high-quality datasets, yet the current workflows
remain labor-intensive, and dependent on domain expertise. Existing approaches automate
isolated steps such as retrieval or generation,
but lack support for the full end-to-end data collection process. We present Quest2DataAgent,
a general-purpose multi-agent framework for
automating scientific data collection workflows.
Given a natural language research question, it
decomposes tasks into structured subtasks, retrieves relevant data using hybrid strategies,
evaluates dataset quality, and generates visualizations through a conversational interface.
We demonstrate its flexibility in two domains:
EcoData for ecological research and PolyData
for polymer materials. Both systems share the
same core architecture but operate over distinct
datasets and user needs. Human evaluations
show that Quest2DataAgent significantly improves data relevance, usability, and time efficiency compared to manual collection and
tool-assisted baselines. The framework is opensource and extensible to other domains.
Loading