GenerationPrograms:  Fine-grained Attribution with Executable Programs

David Wan; Eran Hirsch; Elias Stengel-Eskin; Ido Dagan; Mohit Bansal

GenerationPrograms: Fine-grained Attribution with Executable Programs

David Wan, Eran Hirsch, Elias Stengel-Eskin, Ido Dagan, Mohit Bansal

Published: 08 Jul 2025, Last Modified: 26 Aug 2025COLM 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: long-form qa, rag, summarization, attributed text generation

TL;DR: GenerationPrograms: Fine-grained Attribution via Neural Modular Trees

Abstract: Recent large language models (LLMs) achieve impressive performance in text generation but often fail to accurately attribute their outputs, undermining trust and verifiability. Moreover, existing attribution methods do not explain how and why models leverage the provided source documents to generate their final responses, limiting interpretability. Furthermore, current attributions fail to provide a reason as to how and why the model uses the context to arrive at the final output. To overcome these challenges, we introduce a modular generation framework, GenerationPrograms, inspired by recent advancements in executable ``code agent'' architectures. Unlike conventional generation methods that simultaneously generate outputs and attributions or rely on post-hoc attribution, GenerationPrograms decomposes the process into two distinct stages: first, creating an executable program plan composed of modular text operations (such as paraphrasing, compression, and fusion) explicitly tailored to the query, and second, executing these operations following the program's specified instructions to produce the final response. Empirical evaluations demonstrate that GenerationPrograms significantly improves attribution quality at both document-level and sentence-level granularity across two long-form question-answering tasks. We further demonstrate that GenerationPrograms can effectively function as a post-hoc attribution method, outperforming traditional techniques in recovering accurate attributions. In addition, the interpretable programs generated by GenerationPrograms enable localized refinement through modular-level improvements that further enhance overall attribution quality.

Supplementary Material: zip

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the COLM Code of Ethics on https://colmweb.org/CoE.html

Author Guide: I certify that this submission complies with the submission instructions as described on https://colmweb.org/AuthorGuide.html

Submission Number: 1449

Loading