Abstract: Concept Bottleneck Models (CBMs) propose to
enhance the trustworthiness of AI systems by
constraining their decisions on a set of human
understandable concepts. However, CBMs typically rely on datasets with assumedly accurate
concept labels—an assumption often violated in
practice which we show can significantly degrade
performance. To address this, we introduce the
Concept Preference Optimization (CPO) objective, a new loss function based on Direct Preference Optimization, which effectively mitigates
the negative impact of concept mislabeling on
CBM performance. We provide an analysis on
some key properties of the CPO objective showing it directly optimizes for the concept’s posterior
distribution, and contrast it against Binary Cross
Entropy (BCE) where we show CPO is inherently
less sensitive to concept noise. We empirically
confirm our analysis finding that CPO consistently
outperforms BCE in three real-world datasets with
and without added label noise
Lay Summary: Concept Bottleneck Models (CBMs) are a type of machine learning model that first predict human-understandable concepts — like “has a beak” or “is smiling” — and then use those concepts to make a final decision. This design makes the model’s reasoning easier to inspect and, importantly, allows users to intervene by correcting mispredicted concepts.
Unfortunately, like many machine learning models, CBMs assume all concept labels are accurate — which isn’t realistic. Real-world data is often contaminated with labeling errors due to subjectivity, labeler fatigue, or even standard training tricks like cropping images that can accidentally hide important features. Our work introduces a new training method called Concept Preference Optimization (CPO) that makes CBMs more reliable when labels aren’t perfect.
Instead of treating every label as correct, CPO compares pairs of labels during training and teaches the model to favor those that seem more trustworthy. We show that CPO improves CBM performance even when many concept labels are wrong. It also helps the model better recognize when it’s unsure — a critical ability in high-stakes fields like healthcare or law enforcement.
Link To Code: https://github.com/Emilianopp/ConceptPreferenceOptimization
Primary Area: Social Aspects->Accountability, Transparency, and Interpretability
Keywords: Concept Bottleneck Models, Interpretable AI, XAI
Submission Number: 13713
Loading