#### Readme

You can find an overview of our paper on our website this link: https://owmm-vlm-project.github.io. 

Our training data and models are available at this link: https://huggingface.co/OWMM-VLM-Project.

Our code can be accessed in the following repository: https://github.com/owmm-vlm-project/OWMM-Agent.

In this supplementary material, we sample some training data and test results that are from simulator evaluation and real world evaluation. The material includes the following demos:

* training_dataset_demo

  This folder contains an example episode used for model training. The sample includes 14 training instances with corresponding images. The `scene_graph.gz` file is the config for this episode.

* sim_test_demo

  This folder presents demos of single-step evaluation and episodic evaluation from simulator:

  In single-step evaluation, it contains these following output:
  
  - `gpt_output.jsonl`: Outputs from GPT-4o.
  - `our_models_output_8B.jsonl` & `our_models_output_38B.jsonl`: Predictions from our OWMM-VLM-8B and OWMM-VLM-38B models.
  - `pivot_agent.jsonl` & `robopoint_agent.jsonl`: Agent predictions for PIVOT and RoboPoint using GPT-4o.
  - `pivot.jsonl` & `robopoint.jsonl`: Single-image evaluation results for PIVOT and RoboPoint.
  - `ground_truth.jsonl`: Ground truth annotations generated by our data collection pipeline.
  
  In episodic evaluation, it contains a demo video, `sim_episodic_evaluation_demo.mp4`.
  
* real_test_demo

  This folder presents demos of single-step evaluation and episodic evaluation from real world:

  In single-step evaluation, it contains these following output:

  * `intput_images`: The images of models' input.

  - `OWMM_VLM_8B_annotation` & `OWMM_VLM_38B_annotation`: In these folders, the queries and outputs of our models are presented in `OWMM_VLM_8B_output.jsonl` & `OWMM_VLM_38B_output.jsonl`.  `annotated_{i}.png`  is the image that annotated by model's output.
  - `pivot_annotation` & `robopoint_annotation` are folders that similar in structure to `OWMM_VLM_8B_annotation` & `OWMM_VLM_38B_annotation`. They are the outputs of pivot agent and robopoint agent.

  In episodic evaluation, it contains a demo video, `real_episodic_evaluation_demo.mp4`.