Keywords: Agentic AI, LLMs, Data Lakes, Intention-aware Recommendation
TL;DR: AIDEN turns data-lake table discovery into an agentic LLM workflow, combining retrieval, task-signal extraction, memory, and reranking for intention-aware exploration.
Abstract: Open data lakes are increasingly used in scientific and societal analysis, but finding useful tables remains difficult when analysts do not know the data in advance. Existing table-discovery systems rank tables by query relevance, but often ignore the task context and the next operation the analyst needs. We present AIDEN, an agentic LLM framework for intention-aware table discovery in data lakes. AIDEN extracts query, intention, and operation signals, maintains session state, retrieves candidate tables, and uses an LLM-based recommender to rerank candidates for the current task. We evaluate AIDEN on a LakeBench-based benchmark with 5,593 tables, 100 query tables, manually validated requests, and operation and intention labels. Reranking improves nDCG@10 from 0.383 to 0.394 on join queries and from 0.539 to 0.575 on union queries. Signal extraction achieves 0.967 macro-F1 for intention labels and 0.771 macro-F1 for operation labels using request text alone. AIDEN frames data-lake exploration as a retrieval- and tool-augmented LLM agent for structured data, connecting memory, task understanding, and recommendation for scientific and societal analysis.
Submission Number: 16
Loading