How Faithful are Tool-Augmented Language Models: A Constrained Generation Perspective

ACL ARR 2025 February Submission6413 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Constrained generation methods have demonstrated their potential to improve LM's ability to adhere to lexical constraints, which play an important role in Tool-Augmented Language Models (TALM), an emerging approach to augment LMs' capabilities with external tools, as TALM needs to cover the key information from tools in its response generation. However, the existing TALM pipeline relies on naive prompting when converting the tool outputs to a coherent response, which brings no guarantee all the key information from tools are covered in the LM's final answer. In this paper, we developed a diagnostic dataset to assess naive prompting TALMs' ability to cover key information from tool outputs. We also examined whether constrained generation methods can improve the accuracy of TALMs. Our experiments revealed the insufficiency of prompting and showed existing constrained generation methods are able to improve key information coverage at different costs.
Paper Type: Short
Research Area: Resources and Evaluation
Research Area Keywords: benchmarking; NLP datasets; evaluation
Contribution Types: Data resources, Data analysis
Languages Studied: English
Submission Number: 6413
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview