Don't Trust any Distilled Dataset! Model Hijacking with the Fewest Samples

16 Sept 2025 (modified: 12 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Model Hijacking Attack, Dataset Distillation, Cybersecurity and Privacy
TL;DR: To reveal threats in tranfer learning using dataset distillation, we propose OD attack that uses the fewest samples to launch model hijacking attack.
Abstract: Transfer learning is devised to leverage knowledge from pre-trained models to solve new tasks with limited data and computational resources. Meanwhile, dataset distillation emerges to synthesize a compact dataset that preserves critical information from the original large dataset. Therefore, a combination of transfer learning and dataset distillation offers promising performance in evaluations. However, a non-negligible security threat remains undiscovered in transfer learning using synthetic datasets generated by dataset distillation methods, where an adversary can perform model hijacking attack with $\textit{only a few poisoned samples}$ in the synthetic dataset. To reveal this threat, we propose $\textbf{Osmosis Distillation (OD)}$ attack, a novel model hijacking strategy that targets deep learning models using the fewest samples. The adversary aims to stealthily incorporate a hijacking task into the target model, forcing it to perform malicious functions without alerting the victim. OD attack focuses on efficiency and stealthiness by using the fewest synthetic samples to complete the attack. To achieve this, we devise the transporter that employs a U-Net-based encoder-decoder architecture. The Transporter generates osmosis samples by optimizing visual and semantic losses to ensure that the hijacking task is difficult to detect. The osmosis samples are then distilled into a synthetic set using our specifically designed key patch selection, label reconstruction, and training trajectory matching, ensuring that the synthetic samples retain the properties of the osmosis samples. The model trained on the synthetic dataset can perform the original and hijacking tasks seamlessly. Comprehensive evaluations on various datasets demonstrate that the OD attack attains high attack success rates in hidden tasks while preserving high model utility in original tasks. Furthermore, the synthetic dataset enables model hijacking across diverse model architectures, allowing model hijacking in transfer learning with considerable attack performance and model utility. We argue that awareness of using third-party synthetic datasets in transfer learning must be raised.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 7236
Loading