# MARFT: Multi-Agent Reinforcement Fine-Tuning

MARFT stands for **Multi-Agent Reinforcement Fine-Tuning**. This codebase implements a large language model (LLM)-based multi-agent reinforcement fine-tuning framework for general agentic tasks, providing a foundational MARFT framework.

## Table of Contents
- [About](#about)
- [Features](#features)
- [Getting Started](#getting-started)
- [Environment Extension](#environment-extension)
- [Multi-Adapter](#multi-adapter)
- [Agent-by-Agent Training](#agent-by-agent-training)
- [Resume Training](#resume-training)

## About
This codebase is designed to help researchers transition into the world of reinforcement learning for multi-agent systems. It provides a comprehensive framework for **MARFT**, supporting both **action-level optimization** and **token-level optimization**. The system is flexible and scalable to various agentic tasks through user-defined environments.

## Features
- **Action and Token Optimization**: Supports fine-grained optimization at both action and token levels.
- **Environment Extension**: Simple tools to implement new task environments.
- **Multi-Adapter Support**: Agents share a base model but are equipped with unique LoRA adapters.
- **Agent-by-Agent Training**: Enables sequential training of agents with parameter freezing.
- **Resume Training**: Continue training from saved checkpoints with support for critic recovery.

## Getting Started

### Installation
1. Create a virtual environment:
   ```bash
   conda create -n marft
   conda activate marft
   ```

2. Install dependencies:
   ```bash
   cd MARFT
   pip install -r requirements.txt
   ```

   **Note**: You may need to adjust package versions to match your CUDA version.

## Environment Extension
To create a custom environment for your specific agentic task:
1. Navigate to `marft/envs` and create a folder for your environment.
2. Create a Python file (e.g., `env_name.py`) and implement the necessary environment components:
   - `__init__`: Initialize the environment.
   - `reset`: Reset the environment state.
   - `step`: Define the agent's action step.
   - `transition`: Define state transitions.
3. Create a corresponding `runner` in `runner/shared` and the training script in `scripts`.

**Example**:
   ```python
   class CustomEnv:
       def __init__(self):
           # Initialize your environment
           pass

       def reset(self):
           # Reset the environment state
           pass

       def step(self, action):
           # Define how the environment responds to actions
           pass

       def transition(self, state):
           # Define state transitions
           pass
   ```

## Multi-Adapter
The framework supports a multi-agent system (MAS) where each agent shares the same base model but uses different **LoRA (Low-Rank Adaptation)** adapters. This allows agents to specialize in different tasks while maintaining a shared foundation. Checkpoint loading is also supported for seamless model resumption.

## Agent-by-Agent Training
The codebase supports **agent-by-agent training**, where a single agent is trained while others are frozen. This is controlled by the `--agent_iteration_interval` argument, which defines the training interval for each agent.

## Resume Training
LLMs are hard to train and the training process often crashes if the LLM explores some exotic tokens, which is really normal. Thus, resume training helps to resume training if the LaMAS performance starts to collapse. To use resume training, specify the argument `--load_path`, and under the path, there should be multiple folders contain different LoRA adapter parameters and configurations. Also, a critic model `critic.pth` should be contained and it will be auto-loaded.