Long-Document QA with Chain-of-Structured-Thought and Fine-Tuned SLMs

Published: 26 Jan 2026, Last Modified: 11 Feb 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Information Extraction, Document Analysis, Small Language Models, Reinforcement Learning
Abstract: Large language models (LLMs) are widely applied to data analytics over documents, yet direct reasoning over long, noisy documents remains brittle and error-prone. Hence, we study document question answering (QA) that consolidates dispersed evidence into a structured output (e.g., a table, graph, or chunks) to support reliable, verifiable QA. We propose a two-pillar framework, LiteCoST, to achieve both high accuracy and low latency with small language models (SLMs). Pillar 1: Chain-of-Structured-Thought (CoST). We introduce a CoST template—a schema-aware instruction that guides a strong LLM to produce both a step-wise CoST trace and the corresponding structured output. The process induces a minimal structure, normalizes entities/units, aligns records, serializes the output, and verifies/refines it (optionally with an LLM-as-judge), yielding auditable supervision. Pillar 2: SLM fine-tuning. We then train compact models on the LLM-generated CoST traces/structured data in two phases—Supervised Fine-Tuning for structure/format/steps, followed by Group Relative Policy Optimization with dual rewards for answer/format quality and process consistency—transferring structure-first behavior to SLMs for low-latency deployment. This approach achieves LLM-comparable quality on finance and legal long-document QA (Loong), with 3B/7B SLMs while delivering 2–4×lower latency than GPT-4o and DeepSeek-R1 (671B).
Supplementary Material: zip
Primary Area: interpretability and explainable AI
Submission Number: 11413
Loading