"""
# LLM In-Context Learning Benchmark with vLLM

This project provides a framework for benchmarking the in-context learning capabilities of Large Language Models (LLMs) on various algorithmic tasks. It leverages the `vLLM` library for high-throughput, batched inference to efficiently evaluate models against a suite of generated datasets.

The script automatically generates prompts containing a problem description, a set of training examples (input -> output), and a test input. It then calls the LLM to predict the label for the test input and saves the model's prediction along with the ground truth for later analysis.

## Dependencies

A complete list of dependencies with exact versions is provided in the `environment.yml` file.

## Installation

1.  **Install vLLM and other dependencies:**
    The vLLM installation is hardware-specific. Please refer to the [official vLLM installation guide](https://docs.vllm.ai/en/latest/getting_started/installation.html) for instructions tailored to your CUDA and PyTorch versions.

    A general installation command is:
    ```bash
    pip install vllm "torch>=2.1.0"
    ```

2.  **Required Files**:
    This script assumes the presence of a `data_handler.py` file containing the data generator classes (`BinaryDataGenerator`, `PrimeDataGenerator`, etc.). Ensure this file is in the same directory as the main script.

## Usage

The script is executed from the command line. 

### Replicate paper
Here we used three differnt models, Qwen3-30B-A3B-Instruct-2507, Qwen3-Coder-30B-A3B-Instruct, and Deepseek-Coder-33B-Instruct. Rest all config are set default in the code.

```bash
python vllm_incontext.py --model <huggingface-model-id> --output-file <path-to-save-results.json>
```

### Play with code

```bash
python vllm_incontext.py \
    --model "Qwen/Your_model" \
    --batch-size 16 \
    --tensor-parallel-size 2 \
    --output-file "qwen_32b_results.json" \
    --max-model-len 8192
```