# DRIVE: Distributional and Retrieval-Augmented Bidding

This repository contains the implementation of the **DRIVE Framework** (Distributional and Retrieval-Augmented Bidding with Value Evaluation). The framework combines offline reinforcement learning (Decision Transformer) with retrieval-augmented generation and value-based critics to optimize bidding strategies in advertising auctions.

##  Directory Structure

Based on the current project setup, the file structure is organized as follows:

```text
code/
├── bidding_train_env/     # Environment and baseline definitions
├── common/                # Shared utilities and dataloaders
├── data/                  # Data storage
│   ├── traffic/           # Traffic data (e.g., period-x.csv)
│   └── trajectory/        # Trajectory data
├── encodings_auction/     # Generated indices and embeddings (FAISS)
│   ├── encodings.index
│   ├── next_actions.npy
│   ├── trajectories.pkl
│   └── ...
├── results/               # Training logs and model checkpoints
├── saved_model/           # Saved model weights
└── run/                   # Execution scripts
    ├── train_dt_dist.py         # Phase 1: Policy Training (DT)
    ├── run_embedding.py         # Embedding/Encoder Training
    ├── train_iql_critic.py      # Phase 2: Critic Training (IQL)
    ├── train_cql_critic.py      # Phase 2: Critic Training (CQL)
    ├── construct_index.py       # Phase 3: Index Construction
    └── run_evaluate_dt_dist.py  # Evaluation & Testing

```

##  Prerequisites

Ensure you have the following dependencies installed (including FAISS for retrieval):

```bash
pip install torch numpy pandas faiss-gpu  # or faiss-cpu

```


##  Data Preparation

Before running the training scripts, you must download the dataset from the official website, **unzip them**, and place them in the corresponding directories under `code/data/`.

### 1. Download Links

Please download the data according to the track you wish to train on (Dense or Sparse).

#### **Option A: Dense Dataset (Standard Track)**

**Traffic Data (Evaluation):**

* [Period 7-8](https://alimama-bidding-competition.oss-cn-beijing.aliyuncs.com/share/autoBidding_aigb_track_data_period_7-8.zip) | [Period 9-10](https://alimama-bidding-competition.oss-cn-beijing.aliyuncs.com/share/autoBidding_aigb_track_data_period_9-10.zip) | [Period 11-12](https://alimama-bidding-competition.oss-cn-beijing.aliyuncs.com/share/autoBidding_aigb_track_data_period_11-12.zip) | [Period 13](https://alimama-bidding-competition.oss-cn-beijing.aliyuncs.com/share/autoBidding_aigb_track_data_period_13.zip)

**Trajectory Data (Offline Training):**

* [Trajectory Data (Base)](https://alimama-bidding-competition.oss-cn-beijing.aliyuncs.com/share/autoBidding_aigb_track_data_trajectory_data.zip)
* [Extended Part 1](https://alimama-bidding-competition.oss-cn-beijing.aliyuncs.com/share/autoBidding_aigb_track_data_trajectory_data_extended_1.zip)
* [Extended Part 2](https://alimama-bidding-competition.oss-cn-beijing.aliyuncs.com/share/autoBidding_aigb_track_data_trajectory_data_extended_2.zip)

#### **Option B: Sparse Dataset (Final/Sparse Track)**

**Traffic Data (Evaluation):**

* [Final Period 7-8](https://alimama-bidding-competition.oss-cn-beijing.aliyuncs.com/share/final/autoBidding_aigb_track_final_data_period_7-8.zip) | [Final Period 9-10](https://alimama-bidding-competition.oss-cn-beijing.aliyuncs.com/share/final/autoBidding_aigb_track_final_data_period_9-10.zip) | [Final Period 11-12](https://alimama-bidding-competition.oss-cn-beijing.aliyuncs.com/share/final/autoBidding_aigb_track_final_data_period_11-12.zip) | [Final Period 13](https://alimama-bidding-competition.oss-cn-beijing.aliyuncs.com/share/final/autoBidding_aigb_track_final_data_period_13.zip)

**Trajectory Data (Offline Training):**

* [Final Trajectory Part 1](https://alimama-bidding-competition.oss-cn-beijing.aliyuncs.com/share/final/autoBidding_aigb_track_final_data_trajectory_data_1.zip)
* [Final Trajectory Part 2](https://alimama-bidding-competition.oss-cn-beijing.aliyuncs.com/share/final/autoBidding_aigb_track_final_data_trajectory_data_2.zip)
* [Final Trajectory Part 3](https://alimama-bidding-competition.oss-cn-beijing.aliyuncs.com/share/final/autoBidding_aigb_track_final_data_trajectory_data_3.zip)

### 2. File Organization

After downloading and unzipping, ensure your directory structure matches the following:

```text
code/
└── data/
    ├── traffic/        # Place unzipped "period-x.csv" files here
    │   ├── period-7.csv
    │   ├── period-8.csv
    │   └── ...
    └── trajectory/     # Place unzipped "trajectory_data.csv" files here
        ├── trajectory_data.csv
        └── ...

```

> **Note:** If the zip files contain multiple parts (e.g., `extended_1`, `extended_2`), please ensure they are merged or placed correctly as expected by the dataloader (e.g., `training_data_all.csv`).
---

##  Usage Pipeline

The training and evaluation process follows the **DRIVE Framework (Algorithm 1)**. Please execute the scripts in the following order from the `code/` directory.

### Step 1: Policy Training (Phase 1)

Train the distributional Decision Transformer (DT) policy. This optimizes the GMM loss ().

```bash
python run/train_dt_dist.py

```

### Step 2: Encoder Training & Embedding

Train the encoder or generate the necessary embeddings for the retrieval process.

```bash
python run/run_embedding.py

```

### Step 3: Critic Training (Phase 2)

Train the value functions ( and ). You can train either the IQL (Implicit Q-Learning) critic or the CQL (Conservative Q-Learning) critic (or both, depending on your configuration).

**Train IQL Critic:**

```bash
python run/train_iql_critic.py

```

**Train CQL Critic:**

```bash
python run/train_cql_critic.py

```

### Step 4: Index Construction (Phase 3)

Build the offline retrieval index (). This script computes embeddings for trajectories and stores them into the FAISS index and `.npy` files located in `code/encodings_auction/`.

```bash
python run/construct_index.py

```

*Output:* This will populate the `encodings_auction/` folder with `encodings.index`, `next_actions.npy`, `retrieve_rtgs.npy`, etc.

### Step 5: Evaluation

Finally, evaluate the model using the retrieval-augmented strategy. This script performs the generation, retrieval, and execution phases.

```bash
python run/run_evaluate_dt_dist.py

```

---

##  Algorithm Overview

The code implements the following logic:

1. **Offline Training:** Learn a distributional policy  and value critics  from the dataset .
2. **Index Construction:** Create a retrieval database  mapping embeddings  to actions and returns.
3. **Online/Test Inference:**
* **Encoding:** Encode current state.
* **Generation:** Sample candidate actions from the GMM policy.
* **Retrieval:** Query neighbors from  to find high-return historical actions.
* **Execution:** Select the best action () from the combined pool of generated and retrieved candidates using the Critic.