Keywords: Pruning, Machine Unlearning
Abstract: Modifying well-trained models for purposes such as pruning or unlearning, without access to training data or the original loss function, is a challenging problem. While techniques exist for such modification, they often require training data, are computationally expensive, or are architecture-specific. To address this, we investigate the fundamental question of identifying components that are critical to the model’s predictive performance, without access to either gradients or the loss function, and with only distributional access such as synthetic data.
We theoretically demonstrate that the global reconstruction error is linearly bounded by local reconstruction errors for Lipschitz-continuous networks such as CNNs and well-trained Transformers (which, contrary to existing literature, we find exhibit Lipschitz continuity). This motivates using the locally reconstructive behavior of component subsets to quantify their global importance, via a metric that we term *Subset Fidelity*. In the uncorrelated features setting, selecting individual components via their Subset Fidelity scores is optimal, which we use to propose **ModHiFi**, an algorithm for model modification that requires no training data or loss function access. **ModHiFi-P**, for structured pruning, achieves an 11% speedup over the current state of the art on ImageNet models and competitive performance on language models. **ModHiFi-U**, for classwise unlearning, achieves complete unlearning on CIFAR-10 without fine-tuning and demonstrates competitive performance on Swin Transformers.
Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)
Submission Number: 29227
Loading