Bigger Is Not Better Under Differential Privacy: Optimization Failure at Eleven-Billion Scale in Vision–Language Model Fine-Tuning

Published: 02 Mar 2026, Last Modified: 04 Mar 2026ICLR 2026 Workshop ICBINBEveryoneRevisionsCC BY 4.0
Keywords: Differential Privacy, DP-SGD, LoRA, Vision-Language Models, Negative Results, Optimization Failure, Scaling Laws, Privacy-Preserving Fine-Tuning, Medical VQA
TL;DR: DP-SGD + LoRA fine-tuning that works at 3B scale fails to optimize at 11B, revealing a scale-dependent optimization breakdown and a lexical–semantic evaluation mismatch under differential privacy.
Abstract: Differential privacy (DP) is an appealing safeguard for adapting instruction-tuned vision–language models (VLMs), but its scaling behavior under standard private fine-tuning remains unclear. We present a focused negative result for DP-SGD fine-tuning with LoRA on two widely used backbones: PaLI-GEMMA-3B-PT-224 and LLaMA-3.2-11B-Vision-Instruct. We target $(\varepsilon \in \{1,10,100,1000\}, \delta=10^{-5})$ by tuning only the DP-SGD noise scale while keeping the data, epochs, batch size, clipping norm, and all non-private hyperparameters fixed; $\varepsilon$ is computed with standard DP accounting. Across all budgets, the 3B model trains stably and stays close to non-private validation performance, whereas the 11B model becomes optimization-limited, with flat, high loss and collapsed lexical-overlap metrics. BERTScore drops much less, revealing a lexical–semantic mismatch that can obscure residual utility under DP. Our findings suggest that scaling DP-SGD+LoRA to 11B may require additional stability interventions beyond simply relaxing $\varepsilon$.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 90
Loading