When Can Models Learn From Explanations? A Formal Framework for Understanding the Roles of Explanation Data

Peter Hase; Mohit Bansal

When Can Models Learn From Explanations? A Formal Framework for Understanding the Roles of Explanation Data

Peter Hase, Mohit Bansal

13 Mar 2022 (modified: 04 Aug 2025)LNLSReaders: Everyone

TL;DR: We (1) provide a formal framework for characterizing approaches to learning from explanation data, and (2) we propose a synthetic task for studying how models learn from explanation data.

Abstract: Many methods now exist for conditioning models on task instructions and user-provided explanations for individual data points. These methods show great promise for improving task performance of language models beyond what can be achieved by learning from individual (x,y) pairs. In this paper, we (1) provide a formal framework for characterizing approaches to learning from explanation data, and (2) we propose a synthetic task for studying how models learn from explanation data. In the first direction, we give graphical models for the available modeling approaches, in which explanation data can be used as model inputs, as targets, or as a prior. In the second direction, we introduce a carefully designed synthetic task with several properties making it useful for studying a model's ability to learn from explanation data. Each data point in this binary classification task is accompanied by a string that is essentially an answer to the \emph{why} question: ``why does data point x have label y?" We aim to encourage research into this area by identifying key considerations for the modeling problem and providing an empirical testbed for theories of how models can best learn from explanation data.

Track: Archival (will appear in ACL workshop proceedings)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/when-can-models-learn-from-explanations-a/code)

0 Replies

Loading