Deep Reflection Hinting: Leveraging Offline Knowledge for Improving LLM Agents Adaptation

ICLR 2026 Conference Submission21201 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM agents, Web agents, Retrieval-Augmented Generation
TL;DR: Deep Reflection Hinting (DRH) improves LLM agents by distilling offline trajectories, documents, and instructions into transparent, reusable hints that enhance adaptation and generalization without fine-tuning.
Abstract: Large language model (LLM) agents perform well in sequential decision-making tasks, but improving them on unfamiliar domains often requires costly online interactions or fine-tuning on large expert datasets. These strategies are impractical for closed-source models and expensive for open-source ones, with risks of catastrophic forgetting. Offline trajectories offer reusable knowledge, yet demonstration-based methods struggle because raw traces are long, noisy, and tied to specific tasks. We present \emph{Deep Reflection Hinter (DR.Hinter)}, an agentic system that distills offline traces into compact, context-aware hints. A zooming mechanism highlights decisive steps in long trajectories, capturing both strategies and pitfalls. Unlike prior methods, DR.Hinter leverages both successful and failed trajectories, extracting guidance even when only failure data is available, while supporting parallelized hint generation and benchmark-independent prompting. At inference, a retriever selects relevant hints for the current state, providing targeted guidance with transparency and traceability. Experiments on MiniWoB++, WorkArena, and WebArena Lite show that DR.Hinter consistently outperforms strong baselines, including human- and document-based hints.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 21201
Loading