Inducing Neural Network Behavior via Constraint Optimization

ICLR 2026 Conference Submission21467 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Parameter-Space Perturbation, Model Degradation, Improving Model Generalization
TL;DR: A method for modifying trained neural networks through controlled weight perturbations to suppress confidence or improve generalization.
Abstract: Neural network models might have to be modified after training to meet policy or business requirements (e.g., degradation or capability reduction), to improve generalization, or reduce overfitting, without undergoing full retraining. The key question is how to induce these behaviors in a principled and verifiable way. We present two methods for modifying trained neural networks through controlled changes to their weights and biases (while preserving the model’s overall structure and minimizing impact on general performance), encoded as a constraint optimization problem. First, Suppress Training Confidence (STC), reduces the model’s confidence across all inputs without changing predicted classes, enabling controlled model degradation. Second, Change m Classifications (CmC) intentionally alters the predicted class for specific inputs; retraining the model with these updated weights and biases yields improved generalization. We evaluate our method on 10 multiclass image datasets and 5 binary tabular datasets. On image data, both STC and CmC are effective: STC increases training loss by 0.001-2.78 and reduces test accuracy by 0.002-4.82%, while CmC improves test accuracy by up to 10%. Our method guarantees class preservation (STC) or controlled label change (CmC) through constrained optimization, enabling more precise and interpretable model edits than typical gradient-based fine-tuning.
Primary Area: optimization
Submission Number: 21467
Loading