SoK: Unifying Corroborative and Contributive Attributions in Large Language Models

Published: 07 Mar 2024, Last Modified: 07 Mar 2024SaTML 2024EveryoneRevisionsBibTeX
Keywords: training data attribution, citation generation, large language model attribution, explainability
TL;DR: We present a unified framework of large language model attributions motived by modern applications
Abstract: As businesses, products, and services spring up around large language models, the trustworthiness of these models hinges on the verifiability of their outputs. However, methods for explaining language model outputs fall across two distinct fields of study which both use the term "attribution" to refer to entirely separate techniques: citation generation and training data attribution. In many modern applications, such as legal document generation and medical question answering, both types of attributions are important. In this systematization of knowledge paper, we argue for and present a unified framework of large language model attributions. We show how existing methods of different types of attribution fall under the unified framework. We also use the framework to discuss real-world use cases where one or both types of attributions are required. We believe that this unified framework will guide the use case driven development of systems that leverage both types of attribution, as well as the standardization of their evaluation.
Submission Number: 75
Loading