# Supplementary Materials

**Paper Title:** Polysemous Language Gaussian Splatting via Matching-based Mask Lifting
**Method Name:** MUSplat  
**Submission ID:** 2414

***

## Overview

This document provides an index for the supplementary materials accompanying our submission. These materials are intended to support the claims made in our paper by providing the full source code for our method, **MUSplat**, and a video demonstrating its capabilities. We hope these resources will facilitate the review process by allowing for thorough verification and reproducibility of our work.

## Contents at a Glance

This supplement consists of two primary components:

1.  **`code/`**: Contains the source code, setup instructions, and scripts necessary to reproduce the experiments presented in our paper.
2.  **`Demo Video.mp4`**: A high-definition video showcasing the qualitative results and interactive capabilities of **MUSplat**.

***

## 1. `code/` — Source Code

**Purpose:**
* To provide a complete and functional implementation of the **MUSplat** paradigm for verification purposes.
* To ensure the full reproducibility of all experimental results, including main figures and ablation studies, reported in our manuscript.

**Getting Started:**
For detailed instructions on environment setup, data preparation, and commands to run the pipeline and reproduce experiments, please refer to the comprehensive **`README.md`** file located within the `code/` directory.

***

## 2. `Demo Video.mp4` — Demonstration Video

**Purpose:**
* To dynamically demonstrate the qualitative performance of **MUSplat** on the task of open-vocabulary 3D object selection.
* To supplement the static figures in the paper by showcasing the method's precision and robustness across diverse and challenging 3D scenes.

**Video Highlights:**
The video is dedicated to the open-vocabulary 3D object selection task, presenting results on two distinct benchmark datasets to showcase the method's versatility and accuracy.

* **Performance on the LERF Dataset:**
    * Demonstrates the model's ability to interpret complex, fine-grained, and compositional language queries within visually rich, everyday scenes.
    * Includes examples of selecting specific items from a cluttered table (e.g., "chashu") and identifying small parts of larger objects (e.g., "dall-e brand"), highlighting the precision of the method.

* **Performance on the Grasp-Net Dataset:**
    * Highlights the model's robustness in highly cluttered environments where numerous objects are closely packed, adjacent, or partially occluding one another.
    * Showcases the sharp and accurate segmentation boundaries **MUSplat** achieves, effectively disambiguating between objects in these challenging configurations.

**Technical Specifications:**
* **Resolution:** 1920x1080 (1080p)
* **Duration:** 1:57
* **Codecs:** H.264 (Video), AAC (Audio)

**Playback Recommendation:**
The video is encoded in a standard MP4 format and should play correctly on all modern operating systems. For the best viewing experience, we recommend using a standard media player such as VLC, IINA, or the default players in Windows and macOS.