PUO-Bench: A Panel Understanding and Operation Benchmark with A Privacy-Preserving Framework

Wei Lin; Yiwei Zhou; Junkai Zhang; Rui Shao; Zhiyuan Zhao; Junyu Gao; Antoni B. Chan; Xuelong Li

PUO-Bench: A Panel Understanding and Operation Benchmark with A Privacy-Preserving Framework

Wei Lin, Yiwei Zhou, Junkai Zhang, Rui Shao, Zhiyuan Zhao, Junyu Gao, Antoni B. Chan, Xuelong Li

Published: 18 Sept 2025, Last Modified: 30 Oct 2025NeurIPS 2025 Datasets and Benchmarks Track posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: panel understanding and operation

TL;DR: The Panel Understanding and Operation (PUO) benchmark

Abstract: Recent advancements in Vision-Language Models (VLMs) have enabled GUI agents to leverage visual features for interface understanding and operation in the digital world. However, limited research has addressed the interpretation and interaction with control panels in real-world settings. To bridge this gap, we propose the Panel Understanding and Operation (PUO) benchmark, comprising annotated panel images from appliances and associated vision-language instruction pairs. Experimental results on the benchmark demonstrate significant performance disparities between zero-shot and fine-tuned VLMs, revealing the lack of PUO-specific capabilities in existing language models. Furthermore, we introduce a Privacy-Preserving Framework (PPF) to address privacy concerns in cloud-based panel parsing and reasoning. PPF employs a dual-stage architecture, performing panel understanding on edge devices while delegating complex reasoning to cloud-based LLMs. Although this design introduces a performance trade-off due to edge model limitations, it eliminates the transmission of raw visual data, thereby mitigating privacy risks. Overall, this work provides foundational resources and methodologies for advancing interactive human-machine systems and robotic field in panel-centric applications.

Croissant File: json

Dataset URL: https://huggingface.co/datasets/Tele-AI-MAIL/Panel-Understanding-and-Operation

Supplementary Material: zip

Primary Area: Datasets & Benchmarks for applications in language modeling and vision language modeling

Submission Number: 1551

Loading