# Lifelong Autonomous Improvement of Navigation Foundation Models in the Wild

This code supplement contains the code used for training models and running experiments as outlined in the main paper, meant to allow for replication. There are 3 repositories included. Our contributions are in multinav, jaxrl_minimal, and create_ws, as well as a deployment script added to visualnav-transformer. There are 4 other repositories that this project depends on, and they are publicly available: [agentlace](https://github.com/youliangtan/agentlace), [dlimp](https://github.com/kvablack/dlimp),  [oxe_envlogger](https://github.com/rail-berkeley/oxe_envlogger), and [visualnav-transformer](https://github.com/robodhruv/visualnav-transformer). Here are the roles of each repo:

- **multinav-rl**: implementation of policy training and fine tuning, data manipulation, and the autonomous robot system
- **jaxrl_minimal**: implementation of reinforcement learning agents
- **create_ws**: implementation of the robot action server and state machine, as used by the robot


- **agentlace**: distributed machine learning structure, allowing us to collect data and train on separate machines
- **dlimp**: data loading capabilities
- **oxe_envlogger**: simplifying data collection for RL, used by agentlace
- **visualnav-transformer**: Baseline navigation model used for comparison

## Environments
For the purposes of the experiments run in this paper, two envirnoments were used. In order to train and run our models, we used an environment with the packages listed in `.multinav-rl/multinav/utils/requirements_jaxrl.txt`. In order to deploy ViNT, the baseline model we use for comparison, we used an environment with the packages listed in `.multinav-rl/multinav/utils/requirements_vint.txt`.

## Dataset
There is a small snippet of the dataset collected during one of our fine tuning runs at ./dataset. It contains 7 trajectories. In order to load it, run the following lines of code:

```
import tensorflow_datasets as tfds
from dlimp.dataset import DLataset
dataset_builder = tfds.builder("dataset:0.0.1", data_dir="./data")
dataset = (
    DLataset.from_rlds(dataset_builder) 
)
traj = next(dataset.iterator())
```

This loads the dataset at the trajectory level, so each sample is an entire trajectory from start to reach / crash / timeout. 

# Train Offline
To train a cql or bc model offline, run this script, filling in the appropriate information.

```
python ./multinav-rl/multinav/deploy/train/train_offline.py 
    --data_mix [offline data mixture to use] 
    --data_dir [path where to load training data] 
    --model_config ./multinav-rl/multinav/deploy/train/model_config.py:[model type] 
    --checkpoint_interval [checkpoint interval] 
    --checkpoint_save_dir [path where to save checkpoints] 
    --data_config.reward_type [dense / sparse]  
``` 

More model specifications can be added over commandline as well. For example, to train a cql model where the critic can use proprio, you can add `--model_config.agent_config.critic_use_proprio`.

# Robot Action Server
For this paper, we used an iRobot Create 3 with an added camera and lidar. Our code for launching all the appropriate sensors and the robot action server is in the script at `./create_ws/src/deployment/launch/nav_fallback.sh`. In order to be able to use this properly, you would need to collect a map so the robot can localize properly. 

# Goal Loop Collection
In order to use image goals, you need to collect a goal loop. In order to do this, once the robot action server is running, run the recorder script, filling in the appropriate information.

```
python ./multinav-rl/multinav/deploy/robot/recorder.py 
    --data_save_dir  [path to save data] 
    --max_time [maximum # of seconds to record for] 
    --server_ip [robot action server IP address] 
    --handle_crash False
```

Then, teleop along your goal path. After you're done, kill the script. This will generate TFDS records with your trajectory. In order to convert them to the `.npz` format expected for model deployment and online fine-tuning, adjust the variables set in `./multinav-rl/multinav/data/tfds_to_pkl_npz.py` to match where your recording was saved and where you want to save the `.npz`. Then, run `python ./multinav-rl/multinav/data/tfds_to_pkl_npz.py`. 

# Deploy Model

To deploy a model with the image goal task, run this script, filling in the appropriate information. The robot action server must also be running at this time. 

```
python ./multinav-rl/multinav/deploy/robot/model_deployment.py
    --data_save_dir [where to data collected during deployment]
    --action_type [model type: gc_bc, gc_cql]
    --max_time [how many seconds to deploy for]
    --server_ip [robot action server IP address]
    --goal_dir [path to .npz goal loop]
    --checkpoint_load_dir [where to load model checkpoint]
    --checkpoint_load_step [checkpoint step to load] 
``` 

## Deploying ViNT
To deploy a ViNT model with the same image goal setup, run this script, filling in the appropriate information. 

```
python ./visualnav-transformer/vint_model_deployment.py 
    --max_time [how many seconds to deploy for] 
    --server_ip [robot action server IP address]
    --goal_dir [path to .npz goal loop]
``` 


# Fine Tune
In order to fine tune, the robot action server must be running, so that the fine tuning actor is able to send actions and receive observations. The fine tuning actor sends incoming observations to the fine tuning trainer, does inference, and periodically gets new model weights from the fine tuning trainer. The fine tuning trainer receives new online data and trains the model. 

## Fine Tuning Trainer
To launch the fine tuning trainer, run this script, filling in the appropriate information. 

```
XLA_PYTHON_CLIENT_MEM_FRACTION=0.6 python ./multinav-rl/multinav/deploy/train/train_learner.py 
    --wait_data [how many pieces of data should be collected before training begins] 
    --data_mix [offline data mix to mix in] 
    --data_dir [path where to load offline training data] 
    --data_save_dir [path to save data] 
    --wandb_name [weights and biases run name] 
    --checkpoint_load_dir [where to load model checkpoint]
    --checkpoint_load_step [checkpoint step to load] 
    --checkpoint_save_dir [where to save new model checkpoints] 
    --checkpoint_interval [how frequently to save model checkpoints] 
    --model_config ./multinav-rl/multinav/deploy/train/model_config.py:[model type] 
    --offline_data_config.reward_type [dense / sparse]
    --online_data_config.reward_type [dense / sparse]
```

More model specifications can be added over commandline as well. For example, to train a cql model where the critic can use proprio, you can add `--model_config.agent_config.critic_use_proprio`.

## Fine Tuning Actor
To launch the fine tuning actor, run this script, filling in the appropriate information. 

```
XLA_PYTHON_CLIENT_PREALLOCATE=false python ./multinav-rl/multinav/deploy/robot/train_actor.py
    --trainer_ip [trainer IP address] 
    --robot_ip [robot action server IP address] 
    --goal_dir  [path to .npz goal loop] 
    --seed [random seed to use]
```