ContextCite: Attributing Model Generation to Context

Benjamin Cohen-Wang; Harshay Shah; Kristian Georgiev; Aleksander Madry

ContextCite: Attributing Model Generation to Context

Benjamin Cohen-Wang, Harshay Shah, Kristian Georgiev, Aleksander Madry

Published: 25 Sept 2024, Last Modified: 06 Nov 2024NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: attribution, citation, generative models, large language models

TL;DR: We introduce ContextCite, a simple and scalable method to identify the parts of the context that are responsible for a language model generating a particular statement.

Abstract: How do language models use information provided as context when generating a response? Can we infer whether a particular generated statement is actually grounded in the context, a misinterpretation, or fabricated? To help answer these questions, we introduce the problem of *context attribution*: pinpointing the parts of the context (if any) that *led* a model to generate a particular statement. We then present ContextCite, a simple and scalable method for context attribution that can be applied on top of any existing language model. Finally, we showcase the utility of ContextCite through three applications: (1) helping verify generated statements (2) improving response quality by pruning the context and (3) detecting poisoning attacks. We provide code for ContextCite at https://github.com/MadryLab/context-cite.

Primary Area: Natural language processing

Submission Number: 17808

Loading