Keywords: Machine Learning, Deep Neural Network, Provable Editing, Explainable AI, Interpretability, Grad-CAM, Integrated Gradients
TL;DR: Efficient technique for enforcing hard constraints on the gradients of DNN by editing the DNN parameters.
Abstract: In explainable AI, DNN gradients are used to interpret the prediction; in safety-critical control systems, gradients could encode safety constraints; in scientific-computing applications, gradients could encode physical invariants. While recent work on provable editing of DNNs has focused on input-output constraints, the problem of enforcing hard constraints on DNN gradients remains unaddressed. We present ProGrad, the first efficient approach for editing the parameters of a DNN to provably enforce hard constraints on the DNN gradients. Given a DNN $\mathcal{N}$ with parameters $\theta$, and a set $\mathcal{S}$ of pairs $(\mathrm{x}, \mathrm{Q})$ of input $\mathrm{x}$ and corresponding linear gradient constraints $\mathrm{Q}$, ProGrad finds new parameters $\theta'$ such that $\bigwedge_{(\mathrm{x}, \mathrm{Q}) \in \mathcal{S}} \frac{\partial}{\partial \mathrm{x}}\mathcal{N}(\mathrm{x}; \theta') \in \mathrm{Q}$ while minimizing the changes $\lVert\theta' - \theta\rVert$. The key contribution is a novel *conditional variable gradient* of DNNs, which relaxes the NP-hard provable gradient editing problem to a linear program (LP), enabling ProGrad to use an LP solver to efficiently and effectively enforce the gradient constraints. We experimentally evaluated ProGrad via enforcing (i) hard Grad-CAM constraints on ImageNet ResNet DNNs; (ii) hard Integrated Gradients constraints on Llama 3 and Qwen 3 LLMs; (iii) hard gradient constraints in training a DNN to approximate a target function as a proxy for safety constraints in control systems and physical invariants in scientific applications. The results highlight the unique capability of ProGrad in enforcing hard constraints on DNN gradients.
Primary Area: Social and economic aspects of machine learning (e.g., fairness, interpretability, human-AI interaction, privacy, safety, strategic behavior)
Submission Number: 13609
Loading