DETAIL: Task DEmonsTration Attribution for Interpretable In-context Learning

Published: 25 Sept 2024, Last Modified: 22 Dec 2024NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: in-context learning, data attribution, large language models
TL;DR: We propose DETAIL, an influence function-based technique to provide attribution for task demonstrations in in-context learning.
Abstract: In-context learning (ICL) allows transformer-based language models that are pre-trained on general text to quickly learn a specific task with a few "task demonstrations" without updating their parameters, significantly boosting their flexibility and generality. ICL possesses many distinct characteristics from conventional machine learning, thereby requiring new approaches to interpret this learning paradigm. Taking the viewpoint of recent works showing that transformers learn in context by formulating an internal optimizer, we propose an influence function-based attribution technique, DETAIL, that addresses the specific characteristics of ICL. We empirically verify the effectiveness of our approach for demonstration attribution while being computationally efficient. Leveraging the results, we then show how DETAIL can help improve model performance in real-world scenarios through demonstration reordering and curation. Finally, we experimentally prove the wide applicability of DETAIL by showing our attribution scores obtained on white-box models are transferable to black-box models in improving model performance.
Supplementary Material: zip
Primary Area: Interpretability and explainability
Submission Number: 9090
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview