# Code for FINE-TUNING OFFLINE POLICIES WITH OPTIMISTIC ACTION SELECTION

## O3F for LAPO
`main_d4rl.py` is the script for the offline phase based on the original paper's implementation
(https://github.com/pcchenxi/LAPO-offlienRL).
Here is an example command to train halfcheetah-medium-v2 offline:
`python -m main_d4rl --ExpID 0000 --env_name=halfcheetah-medium-v2
--std_architecture=linear_layer_stop_grad --exp_name=offline_halfcheetah-medium-v2
--num_q_function=5 --gradient_clipping=10 --num_actions_to_sample=100 --no_noise --not_train_v_func
--seed=0`

`main_d4rl_finetune.py` is the script for the online phase.
Here is an example command to train halfcheetah-medium-v2 online after having completed the offline
phase:
`python -m main_d4rl_finetune --ExpID 0000 --env_name halfcheetah-medium-v2
--deterministic_actions=1 --std_architecture=linear_layer_stop_grad
--exp_name=fine_tuning_halfcheetah-medium-v2
--load_path=./results/Exp0000/halfcheetah-medium-v2/offline_halfcheetah-medium-v2/seed_0/
--num_actions_to_sample=100 --no_noise --action_space_noise=0.2 --num_q_functions=5
--gradient_clipping=10 --not_train_v_func --optimism_parameter=0 --critic_training_iterations=20 --save_freq=5000 --seed=0`

The file `algos_vae_finetune.py` contains the logic for the action-selection mechanism. The function
`select_action` takes in parameters `num_actions_to_sample` and `action_space_noise`, creates the
action candidates, and evaluates them using the critic ensemble.


## O3F for IQL

`train_finetune.py` is the script for both offline and online fine-tuning phases. It's based on the
original paper's implementation (https://github.com/ikostrikov/implicit_q_learning).
Here is an example of how to train O3F (IQL) both offline and online:
`python -m train_finetune --env_name=halfcheetah-medium-v2 --seed=0 --max_steps=500000
--config=configs/mujoco_finetune_config.py --num_actions_to_sample=100 --fixed_action_noise=0.2
--critic_ensemble_size=2 --optimism_parameter=0`

The file `policy.py` contains the logic for the action-selection mechanism. The function
`_sample_actions` takes in parameters `num_actions_to_sample` and `fixed_action_noise`.
