Mac-Tiger: Multi-Agent Cooperation for Enhanced Text-to-Image Generation

Mac-Tiger: Multi-Agent Cooperation for Enhanced Text-to-Image Generation

ICLR 2026 Conference Submission18695 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Text-to-image generation, Mulit-agent, Image editing, Multimodality

TL;DR: We propose Mac-Tiger, a multi-agent cooperative framework leveraging MLLMs to iteratively refine text-to-image generation, improving semantic consistency and visual coherence in complex compositional tasks.

Abstract: Recent advancements in text-to-image (T2I) generation have significantly improved image fidelity and alignment with textual prompts, yet challenges remain in addressing complex compositional requirements, such as attribute binding, spatial relationships, and numerical precision. To tackle these issues, this paper introduces Mac-Tiger, a novel multi-agent cooperative framework that leverages multimodal large language models (MLLMs) to optimize T2I generation through iterative refinement. Unlike traditional single-agent approaches, Mac-Tiger employs a tri-agent system—comprising Reviewer, Challenger, and Refiner roles—that collaboratively evaluates and refines prompts based on dynamically generated feedback and multimodal analysis. Key innovations include integrating advanced modules for perception, memory, and cooperative planning to facilitate adaptive prompt optimization. Experiments on benchmarks like T2I-CompBench and MagicBrush demonstrate Mac-Tiger’s superior performance in generating semantically consistent and visually coherent images, particularly in scenarios involving intricate object interactions and detailed edits. This work underscores the potential of multi-agent systems to address long-standing limitations in T2I generation, paving the way for more robust and context-aware generative models.

Supplementary Material: zip

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 18695

Loading