Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers

Wei Pang; Kevin Qinghong Lin; Xiangru Jian; Xi He; Philip Torr

Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers

Wei Pang, Kevin Qinghong Lin, Xiangru Jian, Xi He, Philip Torr

Published: 18 Sept 2025, Last Modified: 16 Jan 2026NeurIPS 2025 Datasets and Benchmarks Track posterEveryoneRevisionsBibTeXCC BY-NC 4.0

Keywords: Multimodal Generation, Benchmark, Long-Context Generation, Evaluation, Poster Generation, Multi-Agent Systems

Abstract: Academic poster generation is a crucial yet challenging task in scientific communication, requiring the compression of long-context interleaved documents into a single, visually coherent page. To address this challenge, we introduce Paper2Poster, the first benchmark and metric suite for poster generation, which pairs recent conference papers with author-designed posters and evaluates outputs on (i) Visual Quality—semantic alignment with human posters, (ii) Textual Coherence—language fluency, (iii) Holistic Assessment—six fine-grained aesthetic and informational criteria scored by a VLM-as-judge, and notably (iv) PaperQuiz—the poster’s ability to convey core paper content as measured by VLMs answering generated quizzes. Building on this benchmark, we propose PosterAgent, a top‐down, visual‐in‐the‐loop multi‐agent pipeline: the (a) Parser distills the paper into a structured asset library; the (b) Planner aligns text–visual pairs into a binary‐tree layout that preserves reading order and spatial balance; and the (c) Painter–Commenter loop refines each panel by executing rendering code and using VLM feedback to eliminate overflow and ensure alignment. In our comprehensive evaluation, we find that GPT‐4o outputs—though visually appealing at first glance—often exhibit noisy text and poor PaperQuiz scores; We find that reader engagement is the primary aesthetic bottleneck, as human‐designed posters rely largely on visual semantics to convey meaning. Our fully open‐source Paper2Poster pipeline outperforms GPT‐4o–based systems across nearly all metrics while consuming 87 \% fewer tokens. These findings chart clear directions for the next generation of fully automated poster‐generation models.

Croissant File: json

Dataset URL: https://huggingface.co/datasets/Paper2Poster/Paper2Poster

Code URL: https://github.com/Paper2Poster/Paper2Poster

Supplementary Material: pdf

Primary Area: Applications of Datasets & Benchmarks for in Creative AI

Submission Number: 1012

Loading