Beyond Task-Specific Reasoning: A Unified Conditional Generative Framework for Abstract Visual Reasoning

Fan Shi; Bin Li; Xiangyang Xue

Beyond Task-Specific Reasoning: A Unified Conditional Generative Framework for Abstract Visual Reasoning

Fan Shi, Bin Li, Xiangyang Xue

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: A unified conditional generative solvers to solve different abstract visual reasoning tasks

Abstract: Abstract visual reasoning (AVR) enables humans to quickly discover and generalize abstract rules to new scenarios. Designing intelligent systems with human-like AVR abilities has been a long-standing topic in the artificial intelligence community. Deep AVR solvers have recently achieved remarkable success in various AVR tasks. However, they usually use task-specific designs or parameters in different tasks. In such a paradigm, solving new tasks often means retraining the model, and sometimes retuning the model architectures, which increases the cost of solving AVR problems. In contrast to task-specific approaches, this paper proposes a novel Unified Conditional Generative Solver (UCGS), aiming to address multiple AVR tasks in a unified framework. First, we prove that some well-known AVR tasks can be reformulated as the problem of estimating the predictability of target images in problem panels. Then, we illustrate that, under the proposed framework, training one conditional generative model can solve various AVR tasks. The experiments show that with a single round of multi-task training, UCGS demonstrates abstract reasoning ability across various AVR tasks. Especially, UCGS exhibits the ability of zero-shot reasoning, enabling it to perform abstract reasoning on problems from unseen AVR tasks in the testing phase.

Lay Summary: Humans can solve visual puzzles by understanding abstract rules, for example, finding the odd image out or completing a missing piece in a pattern. These problems belong to abstract visual reasoning (AVR) tasks. Teaching computers to solve AVR tasks is a key step toward developing AI systems that reason more like humans. Most existing AI models are trained to handle just one type of AVR task. When facing a new task, they often need to be retrained or redesigned from scratch, which can be a time-consuming and expensive process. In this work, we propose a framework that can handle multiple AVR tasks like Raven's progressive matrix and visual analogy problems using one conditional generative model. Instead of training new models for new tasks, our framework can generalize its abstract reasoning ability to unseen tasks. This paper provides a unified perspective for solving AVR tasks. Our experiments show that the proposed framework demonstrates the ability to generalize to new kinds of AVR tasks. This opens up new possibilities for developing more flexible and broadly intelligent AI systems that can adapt to a wide variety of AVR tasks.

Primary Area: Deep Learning->Generative Models and Autoencoders

Keywords: Abstract Visual Reasoning, Conditional Generative Model

Submission Number: 12530

Loading