Concept-RidgeAIME: LLM-Guided Automatic Concept-Based Explanations via Ridge-Regularized Inverse Operators for Trustworthy AI

Concept-RidgeAIME: LLM-Guided Automatic Concept-Based Explanations via Ridge-Regularized Inverse Operators for Trustworthy AI

TMLR Paper6330 Authors

28 Oct 2025 (modified: 02 Nov 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Concept-based explanations overcome the limitations of low-level feature importance and focus on high-level, human-understandable concepts to explain the decision-making behind machine learning models. However, achieving model independence and the simultaneous presentation of global and local information within a single framework has been difficult. This study extends the concept of approximate inverse model explanations (AIME) and proposes Concept-RidgeAIME, which simultaneously obtains global and local explanations via concepts by utilizing a regularized linear approximate inverse mapping as its core. The proposed method learns a two-stage structure---an inverse operator mapping from the model output to the input and an inverse operator mapping from the concept to the input---only once. Subsequently, it efficiently calculates the contribution and ratio of concepts for any individual using simple matrix-vector operations. Without requiring access to internal representations or gradients, it presents global (concept importance ranking) and local (individual concept contributions) information within the same framework, thereby achieving model independence with low overhead. Using the global feature importance as a foundation, this study demonstrates a workflow in which a large language model automatically synthesizes rule concepts composed of normalization thresholds and one-hot equations, then validates the syntax and excludes zero/positive cases to ensure robustness. Evaluations quantified the reconstructability (completeness) of black-box outputs and coverage (projection completeness) at the concept base level using tabular benchmarks (Adult, German Credit, and COMPAS). Stability and efficiency were verified using bootstrap confidence intervals and inference time (millisecond-level). Results showed that Concept-RidgeAIME demonstrated practical advantages over conventional concept-based methods (ConceptSHAP, CBM, and TCAV) and the application of generic SHAP to the concept space. These advantages are achieved by Concept-RidgeAIME through a model-independent implementation that requires no additional training and can handle global, local, and concept mappings in an integrated manner.

Submission Length: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Shahin_Jabbari1

Submission Number: 6330

Loading