Overview
The code and data contained in this supplementary material is provided as part of the ICML 2026 submission for “The Heterogeneous Safety Impacts of Benign Multilingual Fine-Tuning” paper.

Anonymity 
In accordance with ICML 2026 double-blind review policies, all identifying information has been removed from the paper and supporting scripts and data. Before running the scripts, ensure you have replaced placeholder variables (e.g. “HF_KEY”, “YOUR PROJECT” and other placeholders) with your own local paths or environment secrets.

Directory structure
/supplementary_material
│
├── README.txt           (This file)
│
├── code/                (Python scripts for training and evaluation)
│   ├── fine_tuning.py            # LoRA fine-tuning using Unsloth
│   ├── sorry_bench_en.py         # Standard English SORRY-Bench evaluation
│   ├── sorry_bench_local.py      # Multilingual (Local) SORRY-Bench evaluation
│   ├── tinymmlu.py               # General capability evaluation using TinyMMLU
│   ├── tinyalpaca_compliance.py  # Non-adversarial compliance using TinyAlpacaEval
│   ├── en_perplexity.py          # English perplexity via WikiText2
│   ├── local_perplexity.py       # Local language perplexity via Wikipedia
│   └── vector_drift.py           # Mechanistic analysis of safety vector displacement
│
└── data/                (CSV files for tuning and testing)
    ├── fine_tuning/     # Multi-Lingual-Benign-Tune dataset
    └── eval/      # SORRY-Bench evaluation prompts translated into languages for this study, manually validated by human reviewers

Requirements
All experiments in this study were run within Google Colab using L4 GPUs.
Unsloth was used for all fine-tuning.