Concept-RidgeAIME: LLM-Guided Automatic Concept-Based Explanations via Ridge-Regularized Inverse Operators for Trustworthy AI

TMLR Paper6330 Authors

28 Oct 2025 (modified: 11 Apr 2026)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Concept-based explanations overcome the limitations of low-level feature importance and focus on high-level, human-understandable concepts to explain the decision-making behind machine learning models. However, achieving model independence and the simultaneous presentation of global and local information within a single framework has been difficult. This study extends the concept of approximate inverse model explanations (AIME) and proposes Concept-RidgeAIME, which simultaneously obtains global and local explanations via concepts by utilizing a regularized linear approximate inverse mapping as its core. The proposed method learns a two-stage structure---an inverse operator mapping from the model output to the input and an inverse operator mapping from the concept to the input---only once. Subsequently, it efficiently calculates the contribution and ratio of concepts for any individual using simple matrix-vector operations. Without requiring access to internal representations or gradients, it presents global (concept importance ranking) and local (individual concept contributions) information within the same framework, thereby achieving model independence with low overhead. Using the global feature importance as a foundation, this study demonstrates a workflow in which a large language model automatically synthesizes rule concepts composed of normalization thresholds and one-hot equations, then validates the syntax and excludes zero/positive cases to ensure robustness. Evaluations quantified the reconstructability (completeness) of black-box outputs and coverage (projection completeness) at the concept base level using tabular benchmarks (Adult, German Credit, and COMPAS). Stability and efficiency were verified using bootstrap confidence intervals and inference time (millisecond-level). Results showed that Concept-RidgeAIME demonstrated practical advantages over conventional concept-based methods (ConceptSHAP, CBM, and TCAV) and the application of generic SHAP to the concept space. These advantages are achieved by Concept-RidgeAIME through a model-independent implementation that requires no additional training and can handle global, local, and concept mappings in an integrated manner.
Submission Type: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: We have revised the manuscript to improve clarity, correct presentation issues, refine the scope of our claims, and address the reviewers’ concerns regarding ablation, concept definition, and experimental scope. First, we improved the overall presentation of the paper. We corrected typographical and formatting issues, including the previously uncompiled table reference, improved table captions and readability, clarified figure labels and axis semantics, and added a schematic overview of the proposed framework in the appendix. This new overview summarizes the two-stage inverse mapping structure and the LLM-assisted concept construction pipeline, making the methodology easier to follow. Second, we clarified the role of regularization. In particular, we added an explicit ridge ablation study in the appendix, evaluating $\lambda \in {0, 10^{-6}, 10^{-3}, 10^{-1}}$ on the Adult and German datasets. The results show that both BB completeness and projection completeness remain effectively invariant across the tested values. This supports our revised interpretation of ridge regularization as a numerically well-posed safeguard for potentially ill-conditioned settings, rather than as a performance-enhancing component intended to inflate attribution quality. We also clarified the motivation for choosing ridge regularization over alternatives such as L1 and elastic net, emphasizing closed-form solvability, stability, and preservation of contribution geometry. Third, we refined the framing of the concept construction stage. We now explicitly state that, in this work, a concept is treated as an explicit and inspectable rule-based unit rather than as a latent semantic construct. We further clarified that the LLM/GFI pipeline is intended as a reproducible and importance-anchored concept construction procedure, not as evidence of globally optimal concept discovery. In this revised framing, the contribution lies in providing a transparent and inspectable concept basis whose faithfulness is evaluated through BB completeness and projection completeness. Fourth, we softened several claims in the abstract, introduction, and discussion to better align them with the evidence. In particular, we removed or weakened expressions such as “practical advantages” and avoided claims that could be read as implying superiority beyond what is directly supported by the experiments. The revised manuscript now characterizes Concept-RidgeAIME more precisely as a model-independent and computationally efficient framework that unifies global and local concept-level explanations with explicit rule-based interpretability and quantified faithfulness. Fifth, we clarified the scope and limitations of the experiments. We now explicitly note that the current empirical study is limited to tabular datasets with tree-based black-box models. At the same time, we clarify that the proposed framework itself is model-independent and does not require gradients or internal representations, so it can be extended to neural and transformer-based models in future work. Finally, we added a limitation statement regarding concept validity. Specifically, we clarify that this paper evaluates concept quality through explicitness, reproducibility, and faithfulness, as quantified by BB completeness and projection completeness, rather than through human-centered or domain-expert semantic validation. We explicitly state that semantic alignment with domain experts remains an important direction for future work. Overall, these revisions were made to improve the manuscript’s readability, sharpen the scope of its claims, and address the remaining concerns raised during review while preserving the central technical contribution of the paper.
Assigned Action Editor: ~Shahin_Jabbari1
Submission Number: 6330
Loading