# 🧠 Project Title: Towards Reliable Code-as-Policies: A Neuro-Symbolic Framework for Embodied Task Plannin

This repository contains the code and simulation setup for our NeurIPS 2025 paper, implementing reliable task planning and execution using neuro-symbolic validation over RLBench environments.


## 📦 Environment Setup

We recommend using Python 3.8+ and a separate virtual environment for clean dependency management.

### 🔹 Step 1: Create Python Environment
conda create -n nesyro_env python=3.8
conda activate nesyro_env

### 🔹 Step 2: Install Dependencies
pip install -r requirements.txt

### 🔹 Step 3: Install PyRep and CoppeliaSim
git clone https://github.com/stepjam/PyRep.git
cd PyRep
bash install.sh
export COPPELIASIM_ROOT=$PWD/CoppeliaSim
echo "export COPPELIASIM_ROOT=$PWD/CoppeliaSim" >> ~/.bashrc
source ~/.bashrc

### 🔹 Step 4: Install RLBench
cd ..
git clone https://github.com/stepjam/RLBench.git
cd RLBench
pip install -e .

### 🔹 Step 5: Install Downward (PDDL Solver)
git clone https://github.com/aibasel/downward.git
cd downward
python3 build.py -j2  # or ./build.py -j2

### 🔹 Step 6: Configure Required Variables in Code

Before running the main pipeline, make sure to **modify the following placeholders directly in the code**:

- Replace `INPUT_YOUR_PATH` with the **full path** to your Downward solver directory.
- Replace `API_KEY` with your valid **API key**.

### 🔹 Step 7: Install and Launch vLLM Server

We use [vLLM](https://github.com/vllm-project/vllm) for efficient LLM inference.

pip install vllm
vllm serve "meta-llama/Llama-3.2-3B-Instruct" --max-model-len 64000


#### 🧭 Plan Execution Overview

Our framework follows a 4-step pipeline to generate and validate executable robot code grounded in task specifications and observations.

#### 🧩 Pipeline Steps by Type

| Task Type (`--type`) | Pipeline Steps                                   | Description                              |
|----------------------|--------------------------------------------------|------------------------------------------|
| 1                    | `spec → code`                                    | Only generate symbolic spec and code     |
| 2                    | `spec → code → verify`                           | Add symbolic verification                |
| 3                    | `spec → code → verify → validate`                | Full neuro-symbolic validation           |
| 4                    | `spec → code → validate` *(CSC only)*            | Skip symbolic verification, CSC only     |
| 5                    | `spec → code → validate` *(LC only)*             | Skip symbolic verification, LC only      |
| 6                    | `spec → code → codesift`                         | Use CodeSift for validation              |
| 7                    | `baseline`                                       | Run baseline skill-freezing experiment   |

Each step performs the following:

- **spec**: Generate symbolic executable task specification (`exe_spec_*.json`)
- **code**: Generate LLM-based executable Python code (`exe_code_*.py`)
- **verify**: Symbolic verification to match code with spec
- **validate**: Execution-time skill-by-skill confidence check using CSC & LC feedback
- **codesift**: Apply CodeSift language model validation
- **baseline**: Partial code freezing and regeneration for ablation study

---

#### 🔍 Argument Details

- `--env`: Target environment(s). Choose from:
  - `rlbench`
  - `realworld`

- `--type`: Task pipeline type  
  *(See the table above for available pipeline types)*

- `--obs`: Observation variant to use:
  - `1` → high + low
  - `2` → high + low + oracle
  - `3` → high + middle + low + oracle
  - `4` → oracle only

> 🔸 Logs are stored in: `/NeSyRo/log/`  
> 🔸 Each run will create a timestamped `.txt` log file

#### 🛠 Example Command

python main.py --env rlbench realworld --type 3 --obs 3