# Source Code for Semantic Skill Extraction via Vision-Language Model Guidance for Efficient Reinforcement Learning

In this work, we introduce VanTA (Vision-language model guided Temporal Abstraction), a framework designed to improve skill extraction in reinforcement learning from offline trajectory datasets. Temporal abstraction is essential in hierarchical reinforcement learning, as it helps break down complex decision-making problems into manageable subtasks. Traditional methods often rely on unsupervised objectives or require substantial human intervention, leading to scattered, unmeaningful skill segments. 

To address this, VanTA leverages the powerful knowledge representation capabilities of pretrained Vision-Language Models (VLMs) to guide the skill extraction process, enhancing the semantic alignment and interpretability of extracted skills. Our approach mitigates the challenges posed by sparse rewards and long-horizon tasks by integrating VLM guidance into the temporal segmentation process, reducing the need for human labor and improving learning efficiency.


## Getting started
To set up the required environments, execute the following command: 
```bash
pip install -r requirements.txt
```
To run the skill extraction module, use the command:
```bash
python -u skill_learning/main.py 
```
To run the offline RL conditioned on the extracted skill, use the command:
```bash
python -u offline/skill_condition/run_discrete.py
```
or
```bash
python -u offline/skill_condition/run_continuous.py
```
for continuous control tasks

## Built upon CORL

Our VanTA is built upon OfflineRL-Kit, please refer to https://github.com/yihaosun1124/OfflineRL-Kit for complete information.