# Bridging Fairness and Explainability: Can Input-Based Explanations Promote Fairness in Hate Speech Detection?

This repository contains the code implementation for our ICLR submission: 
**"Bridging Fairness and Explainability: Can Input-Based Explanations Promote Fairness in Hate Speech Detection?"**

---

## Setup

- Run [`preprocess_jigsaw.ipynb`](preprocess_jigsaw.ipynb) to download and pre-process the Jigsaw dataset.  
- We use **`transformers==4.51.0`**.  
- Model and dataset configurations, as well as hyperparameters, are provided in both the paper and this codebase.  
- Vocabulary files and prompts are available under:
  - [`utils/vocabulary/`](utils/vocabulary/)
  - [`utils/prompt.py`](utils/prompt.py)

---

## Experiments

### RQ1: Bias Detection
- Navigate to [`encoder_scripts/`](encoder_scripts/) or [`decoder_scripts/`](decoder_scripts/) to generate bash scripts for:
  - Model training (if applicable)
  - Fairness computation
  - Explanation generation
  - Sensitive feature attribution
  - Correlation computation
- Run the generated bash scripts to reproduce the results reported in the paper.

### RQ2: Model Selection
- Use the scripts in [`encoder_scripts/`](encoder_scripts/) or [`decoder_scripts/`](decoder_scripts/) to generate bash scripts for model selection.
- Run the generated bash scripts to reproduce the results reported in the paper.

### RQ3: Bias Mitigation
- Navigate to [`bias_mitigation_scripts/`](bias_mitigation_scripts/) to generate bash scripts for:
  - Explanation-based debiasing
  - Fairness computation
- Run the generated bash scripts to reproduce the results reported in the paper.
- Additionally, run the fairness computation / explanation generation / sensitive feature attribution / correlation computation scripts within [`bias_mitigation_scripts/`](bias_mitigation_scripts/) to conduct fairness–correlation analysis on explanation-debiased models.

---

## Analysis
- Additional scripts for analysis are provided in the [`analysis/`](analysis/) directory.

---