---

# 🧠 LGCD: LoRA-Gated Contrastive Decoding

This repository provides the official implementation of **LoRA-Gated Contrastive Decoding (LGCD)**, a **training-free** method to improve factuality in **language-adapted large language models (LLMs)** by dynamically leveraging pretrained knowledge at inference time.

> 📄 *"Leveraging Pretrained Knowledge at Inference Time: LoRA-Gated Contrastive Decoding for Multilingual Factual Language Generation in Adapted LLMs"*, ICLR 2026 (under review)

---

## 📌 Highlights

* ⚙️ *No training required*: LGCD operates **entirely at decoding time**, requiring only the adapted model and a reference pretrained model.
* 🧩 *Modular LoRA-based approximation*: Reconstructs pretrained knowledge using **SVD-based LoRA decomposition** from FFN layers.
* 🔐 *Confidence-based control*: Dynamically gates decoding decisions between the adapted model and reconstructed pretrained logits.
* 🔄 *Contrastive decoding with Top-K masking*: Selectively adjusts token logits using contrastive weighting on top candidates only.
* 🌐 *Multilingual support*: Tested across **9 languages** and **12 Language-adapted models**, including Korean, German, Arabic, and Swahili.

---

## 🗂️ File Structure

```bash
.
├── LGCD+long-form_QA.py       # Multilingual LGCD-based QA on long-form QA (medical QA, Factscore)
├── LGCD+lm_eval.py            # Evaluation script for multilingual multiple-choice QA (Global MMLU, TruthfulQA)
├── sample medical qa data     # Approximately 200 examples per language, provided due to upload size limits and anonymous submission policy; full dataset release is pending institutional approval.
└── README.md                  # You are here
```

---

## 🚀 Getting Started

### 1. Dependencies

Install required libraries (Python 3.10+):

```bash
pip install torch transformers numpy tqdm pandas huggingface_hub
```

---

### 2. Running Multilingual QA (Long-Form)

Use this for multilingual **medical QA generation**.

```bash
python LGCD+long-form_QA.py
```

* Generates QA responses in 9 languages.
* Uses LoRA-extracted FFN deltas to enhance factual correctness.
* Saves outputs to a timestamped `.json` file.

---

### 3. Running Multilingual QA Evaluation (MCQ)

For **multiple-choice factuality evaluation** on:

* 🌍 Global MMLU ( languages, zero-/five-shot)
* ✅ Multilingual TruthfulQA

```bash
python LGCD+lm_eval.py
```

* Automatically loads models per language.
* Logs evaluation accuracy per model + decoding strategy.

---

## ⚙️ LGCD Configuration

All LGCD settings are handled via the `LGCDConfig` dataclass:

```python
LGCDConfig(
    lora_rank=32,
    confidence_threshold=0.7,
    contrastive_alpha=0.1,
    contrastive_beta=1.0,
    layer_group='all',
    generation_top_k=100
)
```

* `layer_group`: can be `"all"`, `"lower"`, `"middle"`, `"upper"` – selects subset of FFN layers for LoRA extraction. (default: `"all"`)
* `confidence_threshold`: token-level cutoff to decide whether to trigger contrastive decoding.

---

## 💡 Citation

```bibtex
@article{lgcd2026,
  title={Leveraging Pretrained Knowledge at Inference Time: LoRA-Gated Contrastive Decoding for Multilingual Factual Language Generation in Adapted LLMs},
  author={Anonymous},
  journal={ICLR},
  year={2026}
}
```
---

