# Recall-First Moderation via Distribution-Preserving Augmentation and Committee-Diverse Retrieval

This repository accompanies the ICLR 2026 submission:

 **Recall-First Moderation via Distribution-Preserving Augmentation and Committee-Diverse Retrieval**  
 *Anonymous authors, under double-blind review*

Our work addresses the dominant failure mode in safety-critical content moderation: **false negatives** (missed harmful content). We introduce a **recall-first moderation framework** that unifies:

- **Distribution-preserving augmentation**: length- and semantics-aware paraphrase generation with rigorous KL/JS validation.
- **Committee-diverse retrieval**: dense, MMR, and graph-based retrievers combined to ensure diverse, informative neighborhoods.
- **Recall-oriented decision policies**: calibrated thresholds and ensemble classifiers that explicitly prioritize recall.

The framework benchmarks **open-source vs. commercial RAG pipelines**, and shows that open-source systems (FAISS + local LLaMA-3) can *outperform* commercial APIs in both accuracy and recall while remaining auditable and reproducible.

---

##  Pipeline Overview

The methods are implemented as **four modular pipelines**, each with its own batch script:

1. **Data Augmentation & Validation**  
   - Script: [`/data_augmentation_pipeline/run_pipeline.sh`]
   - Generates 10k length-aware augmented samples per dataset.  
   - Performs **KL/JS statistical validation** (global + per-cluster) to guarantee distributional fidelity.  
   - Outputs: augmented CSVs and validation reports.

2. **Vanilla RAG Baselines (A & B)**  
   - Script: [`/rags_pipelines/run_pipelines.sh`]
   - **Pipeline A (Commercial)**: OpenAI embeddings + hosted inference (gpt-4o-mini).  
   - **Pipeline B (Open-Source)**: FAISS + HuggingFace embeddings + local Power-LLaMA-3.  
   - Provides baseline comparisons, fairness metrics, and leakage-safe train/test splits.

3. **Contrastive Augmentation + Committee Retrieval (A+/B+)**  
   - Script: [`/contrastive_augmentation_pipeline/run_pipeline.sh`]
   - Adds **adversarial/contrastive paraphrases** and integrates **MMR/Graph retrieval** with committee voting.  
   - Outputs: `summary_plus_mmr.json`, `summary_plus_graph.json`.  
   - Reproducible under both A+ (commercial) and B+ (open-source) settings.

4. **DistilRoBERTa Ensemble with Rank+Veto**  
   - Script: [`/distilroberta_pipeline/run_pipeline.sh`]
   - Fine-tunes `distilroberta-base` on augmented data.  
   - Blends classifiers and applies **Rank+Veto fusion** with open-source B+.  
   - Achieves the best FLAGGED recall (≈0.5781) while maintaining high overall utility.

---

##  Data

We use the public **DeepNLP dataset** from Kaggle:

- **Sheet_1.csv** — therapeutic chatbot responses (flagged/not flagged).  
- **Sheet_2.csv** — résumés (flagged/not flagged).  

Augmented and adversarial data are generated automatically by the pipelines. No manual edits are needed.

---

## Running the Pipelines

### 1. Data Augmentation & Validation

\`\`\`bash
cd data_augmentation_pipeline
bash run_pipeline.sh
\`\`\`

Outputs:
- `10KSheet_1_augmented_v5_lenaware.csv`
- `10KSheet_2_augmented_v5_lenaware.csv`
- `validation_reports_v5/` (JSON + Markdown reports)

---

### 2. Vanilla RAG Baselines

\`\`\`bash
cd rags_pipelines
bash run_pipelines.sh
\`\`\`

- Runs **Pipeline A (commercial)** and **Pipeline B (open-source)**.  
- Produces train/test splits (`train_augmented.jsonl`, `test_augmented.jsonl`) and evaluation metrics.  

---

### 3. Contrastive + Committee Retrieval (A+/B+)

\`\`\`bash
cd contrastive_augmentation_pipeline
bash run_pipeline.sh
\`\`\`

- Generates adversarial/contrastive samples.  
- Builds a kNN graph for graph-aware retrieval.  
- Runs **PLUS pipelines** with both **MMR** and **Graph** strategies.  
- Outputs stored in `outputs/` and evaluation summaries saved as JSON.

---

### 4. DistilRoBERTa Ensemble

\`\`\`bash
cd distilroberta_pipeline
bash run_pipeline.sh
\`\`\`

- Fine-tunes `distilroberta-base`.  
- Runs inference, blending, Rank+Veto sweeps.  
- Outputs final predictions (`preds_c_blend.jsonl`, `preds_ensemble.jsonl`) and evaluation reports.

---

## Results (Summary)

- **Vanilla baselines**: FLAGGED recall stuck at ≈0.44.  
- **PLUS (Contrastive + Committee)**: recall improved to ≈0.56.  
- **DistilRoBERTa Ensemble**: recall further raised to ≈0.5781 with Acc ≈0.85.  
- **Open-source pipeline (B)** consistently outperformed commercial pipeline (A).

---

## Reproducibility

- All pipelines are **leakage-safe**: augmented rows inherit parent train/test split.  
- Hyperparameters, thresholds, and random seeds are fixed in scripts.  
- Validation scripts output full JSON + Markdown reports.  
- Appendix A.11–A.20 in the paper document all scripts, configs, and evaluation logs.  

---


File Size Limit Note

Due to submission file size constraints, this package includes only the code and the initial datasets (Sheet_1.csv, Sheet_2.csv).