## Overview

Flow Matching shows great promise in offline reinforcement learning (RL), yet optimizing these iterative policies via Backpropagation Through Time (BPTT) is unstable. While prevailing paradigms circumvent this by distilling multi-step flows into single-step approximations, such methods may limit the benefits of iterative refinement. To avoid these sacrifices, we propose **Direct Flow Q-Learning (DFQL)**, a streamlined framework that attains superior results by optimizing flow matching policies without BPTT or distillation. DFQL derives a surrogate objective that directly injects terminal Q-value gradients as a guidance term into each step velocity field, ensuring stable optimization while preserving iterative expressive capacity. Across 73 challenging tasks in OGBench and D4RL, DFQL achieves state-of-the-art results. Additionally, DFQL extends seamlessly to the offline-to-online setting, delivering substantial performance gains without further modification.

## Installation

DFQL requires Python 3.9+ and is based on JAX. The main dependencies are:
- `jax >= 0.4.26`
- `ogbench == 1.1.0`
- `gymnasium == 0.29.1`

To install the full dependencies, simply run:
```bash
pip install -r requirements.txt
````

> **Note**: To use D4RL environments, you need to additionally set up MuJoCo 2.1.0.

## Usage

Here are some example commands:

```bash
# DFQL on OGBench antsoccer-arena (offline RL)
python main.py --env_name=antsoccer-arena-navigate-singletask-v0 --agent.discount=0.995 --agent.alpha_DFQL=10

# DFQL on OGBench visual-cube-single (offline RL)
python main.py --env_name=visual-puzzle-3x3-play-singletask-task1-v0 --offline_steps=500000 --agent.alpha_DFQL=1 --agent.encoder=impala_small --p_aug=0.5 --frame_stack=3

# DFQL on OGBench scene (offline-to-online RL)
python main.py --env_name=scene-play-singletask-v0 --online_steps=1000000 --agent.alpha_DFQL=1
```