# ERPV: Enhancing Visual Reinforcement Learning with Partially Reliable Knowledge from VLMs

Visual Reinforcement Learning (VRL) aims to learn the optimal control policy from scratch, a process that is particularly challenging in complex tasks due to low exploration efficiency. The integration of large-scale vision-language models (VLMs), renowned for their prior common-sense knowledge to infer decision, offers a promising avenue to significantly enhance VRL performance. This paper examines the problem of efficiently transferring knowledge from VLMs to VRL to improve exploration efficiency. We find that VLMs exhibit partial reliability limitations, where VLMs-inferred actions may be inconsistent under certain environmental states. Furthermore, integrating VLMs' knowledge into VRL introduces a substantial obstacle where the policy optimization process suffers from excessive exploration due to knowledge misalignment, significantly hindering convergence efficiency. To address these issues, we propose a novel method that enhances VRL with partially reliable knowledge from VLMs, termed ERPV. Unlike existing methods, ERPV presented two novel modules: First, a Value-aware Policy Guidance module is developed to estimate the reliability of VLMs at different states and adaptively selects reliable VLM-inferred actions to guide policy learning. Second, a VLMs-guided Entropy Regularization module is introduced to mitigate over-exploration by the comparison of confidence between VRL policy and VLMs-inferred actions. Extensive experiments demonstrate that compared to the state-of-the-art methods, ERPV achieves competitive performance in both policy effectiveness and sample efficiency.



## Requirements

```
conda env create -f conda_env.yaml
```

After the installation ends you can activate your environment with:

```
source activate py38
```



## Carla

Download the appropriate package for version 0.9.6 and version 0.9.13 of CARLA. Then, follow the official instructions for installation.

For highway in Carla0.9.6, run in Terminal to open Carla:

```
cd CARLA_0.9.6
CUDA_VISIBLE_DEVICES=gpu_id ./CarlaUE4.sh --world-port=port
```

revise gpu_id and port for need.

For ghost_static in Carla0.9.13, run in Terminal to open Carla:

```
cd CARLA_0.9.13
./CarlaUE4.sh -opengl -RenderOffScreen -carla-rpc-port=port -graphicsadapter=gpu_id
```



## A Example for Train and Eval

1. run Carla

2. run 

   ```
   bash run_local_carla.sh
   ```

   where,  you can modify the following several key parameters

   ```
   export SCENARIOS=highway
   export TASK=HighwayLimit
   export SAVEDIR=./save
   mkdir -p ${SAVEDIR}
   export CRITIC_PATH=/home/xxxx/carla_HighwayLimit_pretrain_Qwen_2B_loss_seed:1_2025-02-24-08-12-24/model
   ```

CRITIC_PATH is the pretrained .pt file for critic network of VLMs. You can use the pre-training file provided by us in the directory (pretrained_critic/carla/carla_HighwayLimit_pretrain_Qwen_7B)  or Run pretrain_vlm_critic_carla.py to get by yourself. The configuration of pretrain_vlm_critic_carla.py is the same as that in run_local_carla.sh.



















