Structured Data Understanding: Not All Tokens Are What You Need

Structured Data Understanding: Not All Tokens Are What You Need

ICLR 2026 Conference Submission16022 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Structured Navigation, Selective Deepening, Long-Context Language Models

Abstract: Large language models typically process information via two dominant paradigms, both of which can be inefficient. The first is a brute-force approach that ingests vast streams of tokens with uniform effort. The second, a selective approach exemplified by Retrieval-Augmented Generation (RAG), often flattens inherently structured data—like codebases or API schemas—into a context-agnostic list of vector chunks. Both methods have critical flaws: the former is computationally prohibitive, while the latter destroys the hierarchical information necessary for complex reasoning. This paper introduces *Selective Deepening*, a new navigational framework for model reasoning that respects and exploits the native structure of data. Instead of retrieving from a flattened pool of information, our method first creates a *structural abstraction*—a computationally inexpensive, low-fidelity "map" of the data that preserves its hierarchy. The model then intelligently *navigates* this map to identify the most relevant areas to "deepen" into. Only after this targeted navigation does the model dedicate its full computational power to analyzing the high-fidelity details of the selected components. By replacing structure-agnostic retrieval with structure-aware navigation, Selective Deepening enables models to reason more effectively and efficiently. We demonstrate the broad applicability and benefits of this principle across diverse tasks, including function calling and code generation. Our experiments show that this approach not only drastically reduces computational overhead but also yields significant improvements in task accuracy by mitigating the context-degradation problems inherent in existing paradigms.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 16022

Loading