From Numbers to Narratives: Goal-Oriented Summarization of Machine Learning Model Differences
Keywords: Explainability, Agentic AI, Model Comparison, Natural Language Explanation, LLM-as-a-Judge
TL;DR: Iterative LLM-based methods that produce concise, faithful natural-language explanations of how two ML models differ in behavior.
Abstract: Non-experts can now obtain natural-language explanations of how two ML models differ by feeding numerical results to an LLM-based agent. However, naively prompting an LLM often omits critical conditions, and non-experts often cannot easily detect these omissions, which can mislead downstream conclusions. We formulate this as goal-oriented summarization and propose Condenser, an iterative method that optimizes an explanation against two objectives: Completeness (faithfulness to observed differences) and Density (informativeness per unit length). Condenser+ extends Condenser with an LLM-based Explorer that actively selects conditions to evaluate. On four settings of increasing complexity (Colored MNIST, Fitzpatrick17k, Dollar Street, and open-condition gender classification), our methods produce concise and complete explanations. They also support downstream LLM tasks, prompt refinement on CelebA and subset benchmarking on Flickr30K, where measurable improvements further validate the effectiveness of our method. Goal-oriented summarization thus yields explanations that are concise, complete, and useful for downstream LLM tasks.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 69
Loading