# AntiSlop-vLLM


AntiSlop-vLLM is a Python toolkit designed to improve the quality of text generated by Large Language Models (LLMs) by actively preventing "slop" – common, overused, or undesirable phrases, patterns, and n-grams. It works with any OpenAI-compatible API endpoint (like those provided by vLLM, llama.cpp, etc.) that supports the `/v1/completions` endpoint and can return `top token logprobs`.

This project is an evolution of the original antislop-sampler, adapted to work with OpenAI-compatible APIs rather than local Hugging Face `transformers` models.

## Overview

The core idea is to generate text incrementally, validate it against a set of configurable rules, and if a violation is detected, backtrack and attempt to generate a different, compliant continuation. This is achieved by:

1.  Requesting text generation from an OpenAI-compatible API in chunks or via a stream.
2.  Applying a series of validators (slop phrases, regex patterns, n-grams) to the generated text.
3.  If a validator flags an issue, the sampler uses cached logprobs (provided by the API) for the problematic token position to select an alternative token.
4.  The new sequence is re-validated. This process can repeat up to a configurable number of retries.
5.  If no valid alternative can be found, the specific violation is suppressed for that position to allow generation to proceed.

AntiSlop-vLLM can be used for single prompt completions or for generating entire datasets with reduced "slop." It also includes an automated pipeline (`auto_unslop.py`) for iteratively refining ban lists based on model outputs.

## Key Features

*   **OpenAI-Compatible:** Works with any API server that implements the `/v1/completions` endpoint and provides top logprobs (e.g., vLLM, TGI, Anyscale, Together.ai, Fireworks.ai, Deepinfra, Lepton, etc.).
*   **Multiple Validators:**
    *   **Slop Phrase Banning:** Ban specific phrases loaded from a JSON file.
    *   **Regex Pattern Banning:** Define complex patterns to avoid using regular expressions from a JSON file.
    *   **N-gram Banning:** Block specific n-grams (uni-, bi-, tri-grams, etc.), with options for stopword removal and language configuration.
*   **Backtracking Mechanism:** Intelligently resamples from API-provided logprobs to find valid alternatives when violations occur, without needing to re-query the API for each backtrack step.
*   **Single & Batch Modes:**
    *   Generate a single, "unslotted" completion for a given prompt.
    *   Process a batch of prompts (from a JSON file or Hugging Face dataset) to create a cleaned dataset.
*   **Iterative Anti-Slop Pipeline (`auto_unslop.py`):**
    *   Automates generating text, analyzing it for over-represented n-grams and phrases (leveraging techniques similar to those in slop_forensics.
    *   Updates ban lists iteratively to progressively improve output quality over multiple runs.
    *   Can generate DPO (Direct Preference Optimization) pairs from outputs of different iterations.
*   **FTPO Pair Generation:** Captures (chosen, rejected) token pairs during backtracking, useful for creating datasets for Final Token Preference Optimization.
*   **Chat Template Formatting:** Can apply Hugging Face chat templates to prompts before sending them to the completions endpoint.
*   **Refusal Detection:** Optionally detects and flags model refusals in batch mode.
*   **Configurable:** Extensive configuration via `config.yaml` and command-line arguments.

## How it Works

1.  **API Interaction:** The system sends requests to the `/v1/completions` endpoint of your specified API. It's crucial that the API can return `logprobs` for the generated tokens.
2.  **Generation:** Text is generated either in discrete chunks (`request_mode: chunk`) or token-by-token via a stream (`request_mode: stream`).
3.  **Validation:** After each new piece of text is generated, it's passed through a series of active validators:
    *   `SlopPhraseValidator`: Checks against a list of exact phrases.
    *   `RegexValidator`: Matches text against a list of regex patterns.
    *   `NGramValidator`: Identifies banned n-grams in the text.
4.  **Backtracking:** If a validator detects a violation at a certain token position:
    *   The `ApiAntiSlopSampler` uses the `logprobs` (alternative token probabilities) returned by the API for that specific position.
    *   It attempts to select a different token that doesn't lead to a violation.
    *   Sampling parameters (temperature, min_p, top_p, top_k) are applied during this local resampling.
    *   The process can be forced to try harder by progressively relaxing sampling constraints (`force_backtrack: true`).
5.  **Suppression:** If, after a set number of retries, no valid alternative token can be found for a specific violation at a specific position, that particular violation instance is "ignored" or "suppressed" for that position. This allows generation to continue rather than getting stuck.
6.  **Output:** Compliant text segments are yielded. In batch mode, full generations are saved to a JSONL file.

## Requirements

*   Python 3.8+
*   An OpenAI-compatible API endpoint (e.g., a running vLLM server).
*   Key Python packages (see `requirements.txt` for specific versions):
    *   `openai>=1.0.0`
    *   `pyyaml>=6.0`
    *   `tiktoken>=0.4.0`
    *   `nltk>=3.6.0`
    *   `pandas`
    *   `wordfreq`
    *   `datasets`
    *   `tqdm`
    *   `transformers>=4.40.0` (for chat templates, refusal detector, and `auto_unslop.py`)
    *   `regex`

## Installation & Setup

1.  **Clone the repository:**
    ```bash
    git clone [redacted]
    cd antislop-vllm
    ```

2.  **Install dependencies:**
    ```bash
    pip install -r requirements.txt
    ```

3.  **NLTK Data:** The scripts will attempt to download necessary NLTK resources (`punkt` for tokenization, `stopwords` for n-gram validation) if they are not found. You can also download them manually:
    ```python
    import nltk
    nltk.download('punkt')
    nltk.download('stopwords')
    ```

4.  **Configure `config.yaml`:**
    *   Copy `config-example.yaml` to `config.yaml`.
    *   Edit `config.yaml` to set your API endpoint (`api_base_url`), API key (if required), `model_name`, paths to ban lists, and default generation parameters.
    *   Example ban lists are provided in the `banlists/` directory. You should curate your own for best results.

## Usage

The primary script for generation is `main.py`. It can be run in single prompt mode or batch dataset generation mode. Configuration is primarily handled by `config.yaml`, but most settings can be overridden by command-line arguments.

### Configuration

Modify `config.yaml` for persistent settings. Key options include:

*   `api_base_url`, `api_key`, `model_name`
*   `slop_phrases_file`, `top_n_slop_phrases`
*   `regex_blocklist_file`
*   `ngram_validator` (including `banned_file`, `remove_stopwords`, `language`)
*   `generation_params` (like `max_new_tokens`, `temperature`, `top_p`, `top_k`, `min_p`, `chunk_size`, `request_mode`)
*   `prompt_template`, `system_prompt` (for dataset generation)
*   `logging_level`

### Example 1: Single Prompt Mode

Generates a single "unslotted" output for a given prompt and prints it to the console.

```bash
python main.py \
    --api-base-url "http://localhost:8000/v1" \
    --api-key "YOUR_API_KEY_OR_XXX" \
    --model-name "Qwen/Qwen3-4B" \
    --chat-template-model-id "Qwen/Qwen3-4B" \
    --logging-level "INFO" \
    --slop-phrases-file "banlists/slop_phrases.json" \
    --top-n-slop_phrases 500 \
    --regex-blocklist-file "banlists/regex_not_x_but_y.json" \
    --ngram-banned-file "banlists/banned_ngrams.json" \
    --max-new-tokens 500 \
    --prompt "Write a short story about a brave knight and a mischievous dragon."
```
*(Adjust `--api-base-url`, `--api-key`, `--model-name`, and `--chat-template-model-id` as needed.)*

### Example 2: Batch Dataset Generation Mode

Generates multiple completions from prompts sourced from a Hugging Face dataset or a JSON file, saving the results to a JSONL file.

**1. Start your vLLM (or other OpenAI-compatible) server:**
   (This is just an example; adapt to your model and hardware.)
```bash
vllm serve unsloth/gemma-3-4b-it \
    --port 8000 \
    --max-model-len 2500 \
    --served-model-name "unsloth/gemma-3-4b-it" \
    --gpu-memory-utilization 0.95 \
    --dtype bfloat16 \
    --api-key "YOUR_API_KEY_IF_SERVER_NEEDS_ONE"
```

**2. Run the `main.py` script in batch mode:**
```bash
python main.py \
    --config config.yaml \
    --output-jsonl "results/creative_writing_generations.jsonl" \
    --input-hf-dataset "Nitral-AI/Reddit-SFW-Writing_Prompts_ShareGPT" \
    --hf-dataset-split "train" \
    --threads 40 \
    --max-prompts 100 \
    --logging-level "INFO" \
    --max-new-tokens 1000 \
    --request-mode "chunk" \
    --chunk-size 20 \
    --ftpo-pairs-jsonl "results/ftpo_pairs.jsonl"
```
*   This example uses settings from `config.yaml` but overrides some for batch processing.
*   `--input-hf-dataset`: Specifies the source of prompts.
*   `--output-jsonl`: Where the generated dataset will be saved.
*   `--threads`: Number of parallel generation workers.
*   `--max-prompts`: Limits the number of new prompts processed per run.
*   `--ftpo-pairs-jsonl`: If specified, saves (chosen, rejected) token pairs for FTPO training (final token preference optimisation).

### Iterative Anti-Slop (`auto_unslop.py`)

The `auto_unslop.py` script provides a pipeline for iteratively refining ban lists and improving generation quality. In each iteration, it:
1.  Runs `main.py` to generate a dataset using the current ban lists.
2.  Analyzes the generated text to identify over-represented n-grams and phrases (compared to a human writing profile).
3.  Updates the `banned_ngrams.json` and `banned_slop_phrases.json` files with new entries.
4.  Collects statistics on lexical diversity and repetition.

This process helps to automatically discover and mitigate model-specific "slop" over several iterations.

**To run the iterative pipeline:**
```bash
python auto_unslop.py
```
Make sure to configure parameters at the top of `auto_unslop.py` (like `NUM_ITERATIONS`, `HF_DATASET_NAME`, `THREADS`, paths to ban lists, etc.) and ensure your API server is running. The script will create a timestamped experiment directory under `results/` to store outputs from each iteration.

### Example Notebook

For more detailed examples and an interactive way to explore the functionality, refer to the `example_run_antislop_vllm.ipynb` notebook in the repository.

## Key Differences from `antislop-sampler`

While `antislop-vllm` is based on the concepts of the original `antislop-sampler`, there are key differences:

*   **Target Environment:** `antislop-vllm` is designed for OpenAI-compatible *remote APIs* (like vLLM), whereas `antislop-sampler` worked directly with Hugging Face `transformers` models loaded locally.
*   **Tokenization & Logprobs:** `antislop-vllm` relies entirely on the remote API to provide tokenization (via returned token strings in logprobs) and the logprobs themselves. The original sampler handled tokenization and logit calculation locally.
*   **API Endpoint:** `antislop-vllm` uses the `/v1/completions` endpoint, not `/v1/chat/completions`, as the former typically provides the necessary per-token logprob information required for backtracking.

## Project Structure

*   `main.py`: Main script for single/batch generation.
*   `auto_unslop.py`: Iterative anti-slop and ban list refinement pipeline.
*   `config.yaml` (and `config-example.yaml`): Configuration files.
*   `api_client/`: Contains `ApiClient` for interacting with OpenAI-compatible APIs.
*   `core/`:
    *   `sampler.py`: `ApiAntiSlopSampler` - the core logic for generation, validation, and backtracking.
    *   `models.py`: Dataclasses for API results and violation info.
*   `state/`:
    *   `generation_state.py`: Manages the state of the text being generated.
*   `validators/`: Implements different validation strategies (`SlopPhraseValidator`, `RegexValidator`, `NGramValidator`).
*   `utils/`: Helper functions for configuration, string manipulation, chat templates, refusal detection, etc.
*   `banlists/`: Directory for example ban list files (JSON format). **Users should curate their own lists.**
*   `data/`: Can store auxiliary data like human writing profiles for `auto_unslop.py`.
*   `results/`: Default output directory for batch generations and `auto_unslop.py` experiments.
*   `example_run_antislop_vllm.ipynb`: Jupyter notebook with usage examples.

## Disclaimers

*   This is research-grade code and may contain bugs or be a work in progress.
*   The effectiveness of "unslopping" heavily depends on the quality of your ban lists. The provided examples are starting points; **it's highly recommended to curate your own lists** tailored to your specific model and use case. The `auto_unslop.py` script can help in this process.
*   Ensure your API endpoint correctly implements the `/v1/completions` specification and reliably returns top logprobs for backtracking to function effectively.

## Contributing

Issues and Pull Requests are welcome!

## Citation

If you use AntiSlop-vLLM or the concepts from the original `antislop-sampler` in your research, please consider citing:

```bibtex
[redacted]
```