Hidden Meanings in Plain Sight: RebusBench for Evaluating Cognitive Visual Reasoning

Seyed Amir Kasaei; Arash Marioriyad; Mahbod Khaleti; MohammadAmin Fazli; Mahdieh Soleymani Baghshah; Mohammad Hossein Rohban

Hidden Meanings in Plain Sight: RebusBench for Evaluating Cognitive Visual Reasoning

Seyed Amir Kasaei, Arash Marioriyad, Mahbod Khaleti, MohammadAmin Fazli, Mahdieh Soleymani Baghshah, Mohammad Hossein Rohban

Published: 04 Mar 2026, Last Modified: 27 Mar 2026HCAIR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Visual Reasoning, Rebus Puzzles, Cognitive AI, Benchmark, LVLMs

TL;DR: We present **RebusBench**, a benchmark of 1,164 rebus puzzles that reveals a fundamental gap in the abstract ``System 2'' visual reasoning capabilities of state-of-the-art LVLMs.

Abstract: Large Vision--Language Models (LVLMs) have achieved remarkable proficiency in explicit visual recognition, effectively describing what is directly visible in an image. However, a critical cognitive gap emerges when the visual input serves only as a clue rather than the answer. We identify that current models struggle with the complex, multi-step reasoning required to solve problems where information is not explicitly depicted. Successfully solving a rebus puzzle requires a distinct cognitive workflow: the model must extract visual and textual attributes, retrieve linguistic prior knowledge (such as idioms), and perform abstract mapping to synthesize these elements into a meaning that exists outside the pixel space. To evaluate this neurosymbolic capability, we introduce RebusBench, a benchmark of 1,164 puzzles designed to test this specific integration of perception and knowledge. Our evaluation of state-of-the-art models (including Qwen, InternVL, and LLaVA) shows a severe deficiency: performance saturates below 10\% Exact Match and 20\% semantic accuracy, with no significant improvement observed from model scaling or In-Context Learning (ICL). These findings suggest that while models possess the necessary visual and linguistic components, they lack the cognitive reasoning glue to connect them.

Paper Type: New Short Paper

Submission Number: 91

Loading