Stealing the Recipe: Hyperparameter Stealing Attacks on Fine-Tuned LLMs

ICLR 2026 Conference Submission22277 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Hyperparameter stealing; Large language models; Shadow models; Black-box attacks
Abstract: Large language models (LLMs) rely on carefully tuned hyperparameters such as optimizer, learning rate, batch size, and model size. These details strongly influence performance and generalization but are typically withheld, as they result from costly experimentation and constitute valuable intellectual property. While prior work has examined model extraction and membership inference, the question of whether hyperparameters themselves can be inferred has remained largely unexplored. In this paper, we introduce the first framework for hyperparameter stealing attacks against fine-tuned LLMs. Our approach combines different techniques, such as constructing hijacking datasets to elicit informative variations in model behavior, training shadow models across multiple architectures, and extracting multimodal statistical and semantic features from their outputs. Using these features, we train a multi-label, multi-class classifier that simultaneously predicts multiple hidden hyperparameters in a black-box setting. Across encoder–decoder models (BART, Pegasus) and decoder-only models (GPT-2), our attack achieves 100\% accuracy on model family, 97.9\% on model size, and strong performance on learning rate (88.7\%) and batch size (80.0\%). Even in mixed-family settings, learning rate and batch size remain identifiable. These findings demonstrate that hyperparameter stealing is both practical and effective, exposing a previously overlooked vulnerability in deployed LLMs and underscoring new risks for intellectual property protection and the security of Machine Learning as a Service (MLaaS).
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 22277
Loading