
Installation :

1. Please install the requirements.
2. For Semantic-SAM, please follow the instructions at their github repository. Please also download the SwinL checkpoint from their repository and place it inside the models folder.
3. For the clip model, please download the check point of the version that you want and put it inside the models folder.(We used EVA02-L-14-336)
4. For evaluation of 3D Semantic Segmentation framework on Replica dataset, please download the dataset from Nice-SLAM github. Please put the cam_params.json file from the original dataset inside the dataset folder.
5. To generate the semantic embeddings, please run the embedding_generation.py code for a specific scene of one of the datsets. This gives you a scene_points_to_ids.csv and scene_ids_to_embeddings.json files.

6. To generate the labels for 3D semantic segmentation, please run the label_generation.py code. 
7. To evaluate 3D semantic segmentation results, please run the evaluation_3d_seg.py. To get the final overall metrics, please run the results_calc_3d_seg.py. (Ground truth points-labels files are generated from the .ply mesh files and ground truth labesl of the original datasets.) 


# Object Retrieval Pipeline

This project provides an **object retrieval and grounding system** for 3D scenes.  
It processes **SR3D+ dataset** scenes, retrieves object instances, analyzes their views, and evaluates performance using **LLMs, CLIP, and VLMs**.

---


## 🚀 Usage

For each scene, you need to run the pipeline  

### Arguments
- `scene_id` (str) – Scene identifier, e.g. `scene0011_00`.

#### Options
- `--api_key <key>` : API key for LLM/VLM models (https://openrouter.ai/) .
- `--output_base <path>` : Base output directory.
- `--dataset_root <path>` : Root dataset (ScanNet) path.
- `--clip_state_dict_path <path>` : Path to CLIP model weights (EVA02).
- `--vlm_model_name <name>` : Vision-language model name.
- `--clip_model_name <name>` : CLIP model name.
- `--bbq_dataset <path>` : Path to the BBQ subset from SRD+ dataset. (https://github.com/linukc/BeyondBareQueries/tree/main/evaluation/object_grounding/data/scannet/sr3d%2B_all_queries)

---

## 📊 Outputs

- **Images**: Object crops with bounding boxes.  
- **Grids**: Numbered view grids for orientation analysis.  
- **JSON Results** (`final.json`):  
  - `true_target`: Ground-truth objects.  
  - `selected_index`: Chosen object index and name.  
  - `iou`: IoU scores for retrieval.  
  - `is_easy`, `is_view_dep`: Dataset difficulty flags.  



