# SemDiD: Semantic-guided Diverse Decoding

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.12](https://img.shields.io/badge/python-3.12-blue.svg)](https://www.python.org/downloads/)

Implementation of "Semantic-guided Diverse Decoding for Large Language Models".

## Overview

SemDiD operates directly in embedding space to balance quality with semantic diversity through three complementary mechanisms:
- Orthogonal directional guidance
- Dynamic inter-group repulsion
- Position-debiased probability assessment

Unlike existing diverse decoding methods that primarily achieve lexical rather than semantic diversity, SemDiD ensures meaningful semantic differentiation among multiple responses from large language models.

## Environment Setup

```bash
# Create a new conda environment
conda create -n lmeval python=3.12
cd my_lm_eval
pip install -e .
pip install torch torchvision torchaudio
pip install langdetect # for ifeval
pip install immutabledict # for ifeval
pip install transformers -U # requires version 4.49.0
conda install -c nvidia nccl # resolves import torch C issues
pip install antlr4-python3-runtime==4.11 # for parsing python code
pip install scipy
pip install nltk
pip install packaging
pip install accelerator
pip install peft
pip install datasets
pip install vllm
pip install sentence-transformers
pip install math-verify
pip install flashinfer-python==0.2.2
pip install tokenizers
conda install -c nvidia cuda-compiler # for DeepSpeed (resolves MissingCUDAException)
```

## Evaluation

We use the lm-eval framework for evaluation, with core algorithms modified from Hugging Face Transformers library implementation, accelerated with VLLM. The implementation is located at my_lm_eval/lm_eval/models/semantic_search.py.

The parameter `num_return_sequences` is used to specify the number of repeated sampling times.

### Evaluation Script

```bash
# Run evaluation
HF_ALLOW_CODE_EVAL=1 HF_DATASETS_TRUST_REMOTE_CODE=true confirm_run_unsafe_code=True python -m lm_eval --model hf --model_args pretrained=Qwen/Qwen2.5-0.5B-Instruct,trust_remote_code=True --tasks arc_challenge,bbh,coqa,drop,ifeval,gsm8k,humaneval,minerva_math,mmlu_pro_plus,wmt16 --limit 1000 --batch_size 8 --device cuda --trust_remote_code --log_samples --apply_chat_template --fewshot_as_multiturn --sampling_method greedy --output_path ./sampling_result_greedy_sample_1.json --num_return_sequences 1
```

### Different Sampling Strategies

#### Greedy Sampling
```bash
source greedy_vllm.sh
```

#### Temperature Sampling
```bash
source temperature_sample_vllm.sh
```

#### Diverse Beam Search
```bash
source diverse_beam_search.sh
```

#### Semantic Guided Search
```bash
source semantic_guided_search.sh
```

## Project Structure

The project contains the following main files:

- `diverse_beam_search.sh` - Evaluation script for Diverse Beam Search
- `greedy_vllm.sh` - Evaluation script for Greedy Search
- `semantic_guided_search.sh` - Evaluation script for our SemDiD method
- `temperature_sample_vllm.sh` - Evaluation script for Temperature Sampling
- `my_lm_eval/` - Evaluation framework source code
    - `lm_eval/models/semantic_search.py` - Core implementation of SemDiD algorithm

## Known Issues

VLLM cannot manually perform left-padding so we avoid it.

## Acknowledgements

Thanks to all contributors who supported this project.