
# Attention Hijacking: Backdooring Text Dataset Distillation via Semantic Anchors


This repository contains the official implementation of the paper **"Attention Hijacking: Backdooring Text Dataset Distillation via Semantic Anchors"**.

## 📂 Project Structure

```text
.
├── configs/               # Hydra configuration files
├── src/                   # Source code (Main logic, Trainer, Models)
├── scripts/               # Reproduction scripts (Entry points)
├── README.md              # Documentation
├── LICENSE.txt            # MIT License
└── requirements.txt       # Dependencies
```
*Note: Directories such as `data/`, `logs/`, and `save/` will be automatically created during data generation and training.*

## 🛠️ Environment Setup

We recommend using Anaconda to manage the environment.

```bash
conda create -n ah_env python=3.10
conda activate ah_env
pip install -r requirements.txt
```

## 📊 Data Preparation

Before running any experiments, generate the datasets. This script generates all configurations used in the paper (SST-2/AG News with various triggers).

```bash
bash scripts/prepare_all_datasets.sh
```

## 🚀 Reproduction

We provide simplified scripts to reproduce the results.

### 1. Main Result (AH on SST-2)

To reproduce the **Attention Hijacking (AH)** attack on SST-2 with the 'film' trigger:

```bash
bash scripts/reproduce_main_result.sh
```

*Note: This runs the distillation process and automatically evaluates the Clean Test Accuracy (CTA) and Attack Success Rate (ASR).*

### 2. Baselines & Clean References

To reproduce the baseline methods (SI, DI-Std, DI-Attn) and the clean references (Std-DD, Attn-DD) :

```bash
bash scripts/reproduce_baselines.sh
```

## 🔧 Custom Usage

You can run experiments with custom configurations by modifying the arguments.

**Example: Advanced Configuration (BERT-Tiny)**

```bash
python src/main.py -m \
    data.task_name=sst2 \
    data.datasets_path="./data/SST2_R0.001_film_Target1" \
    data.preprocessed_datasets_path="./data/SST2_R0.001_film_Target1/preprocessed_bert_tiny" \
    model.model_name="prajjwal1/bert-tiny" \
    distilled_data.label_type=soft \
    distilled_data.attention_label_type=cls \
    distilled_data.attack_strategy=AH \
    train.attack_weight=1.0 \
    base.method="AH_BERT_Tiny"
```

## 📚 References

Our implementation leverages the following open-source libraries and frameworks:

* **[PyTorch](https://pytorch.org/)**: The core deep learning framework.
* **[Hugging Face Transformers](https://huggingface.co/)**: For pre-trained language models and datasets.
* **[Hydra](https://hydra.cc/)**: For elegant configuration management.
* **[Dataset Distillation with Attention Labels](https://github.com/arumaekawa/dataset-distillation-with-attention-labels)**: The foundational codebase for attention-guided distillation.
