# Peak-Return Greedy Slicing (PRGS)

This repository provides the official implementation of **Peak-Return Greedy Slicing (PRGS)**,  
a framework designed to enhance the stitching ability of Transformer-based offline reinforcement learning (Offline RL).  

PRGS introduces a timestep-level subtrajectory slicing mechanism that improves the ability of Transformer-based methods to compose high-quality behaviors from suboptimal trajectories.

---

## Overview

- **MMD-based Return Estimator**  
  Learns a distributional return representation for each state–action pair using a particle-based estimator.

- **Greedy Subtrajectory Slicing**  
  Computes aligned optimistic returns and recursively selects high-return subtrajectories for training.

- **Adaptive History Truncation**  
  Discards irrelevant history during evaluation to ensure consistency with training-time slicing.

---

## Benchmarks

We evaluate PRGS on the following benchmarks:

* **D4RL**: Mujoco locomotion (Hopper, Walker2d, HalfCheetah), AntMaze, Kitchen, Androit, Maze2D tasks
* **AuctionNet**: Large-scale ad bidding logs with budget constraints
* **BabyAI**: Gridworld instruction-following tasks with compositional instructions
