Sensitivity of Small Language Models to Fine-tuning Data Contamination

Sensitivity of Small Language Models to Fine-tuning Data Contamination

TMLR Paper7981 Authors

18 Mar 2026 (modified: 01 Jun 2026)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Small Language Models (SLMs) are increasingly being deployed in resource-constrained environments, yet their robustness to data contamination during instruction tuning remains poorly understood. We systematically investigate the contamination sensitivity of 23 SLMs (270M to 4B parameters) across different model families by measuring susceptibility to syntactic transformations (character and word reversal) and semantic transformations (irrelevant and counterfactual responses), each applied at contamination levels from 1% to 100%. Our results reveal fundamental asymmetries in vulnerability patterns, as syntactic transformations cause catastrophic performance degradation with character reversal producing near-complete failure across all models regardless of size or family, whereas semantic transformations demonstrate distinct threshold behaviors and greater resilience in core linguistic capabilities. We discover a 'capability curse' where larger, more capable models become more susceptible to learning semantic corruptions, effectively following harmful instructions, while our analysis of base versus instruction-tuned variants reveals that alignment provides inconsistent robustness benefits, sometimes even reducing resilience. Layerwise representational analysis across model families and sizes shows a consistent localization of contamination effects toward the final blocks, with syntactic corruption typically inducing stronger late-layer divergence and semantic corruption producing comparatively smaller changes that are often confined to final layers. Our work makes three contributions: (1) empirical evidence that SLMs are disproportionately vulnerable to syntactic contamination patterns, (2) characterization of asymmetric learning dynamics for syntactic versus semantic contamination supported by behavioral and representational analysis, and (3) systematic evaluation protocols for robustness assessment. These findings have deployment implications, suggesting that current robustness assumptions may not hold for smaller models and highlighting the need for contamination-aware training protocols that target late layer representations.

Submission Type: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Akanksha_Saran1

Submission Number: 7981

Loading