Benchmarking Personalized Image Editing Capabilities of Generative Image Editing Models

Benchmarking Personalized Image Editing Capabilities of Generative Image Editing Models

ACL ARR 2026 January Submission6282 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: image editing, personalization benchmark

Abstract: Current generative image editing models largely adopt a one-size-fits-all paradigm, overlooking the stylistic preferences and editing behaviors of individual users. In this paper, we first investigate the necessity of personalization by analyzing the Reddit PSR dataset~\citep{taesiri2025imageedit}, which comprises real-world image editing requests submitted by users. Our empirical analysis reveals strong within-user consistency and the emergence of distinct behavioral clusters across users, indicating that editing styles are inherently idiosyncratic rather than universal. Motivated by these findings, we introduce a personalized image editing benchmark consisting of two complementary components. The first, User-Specific History, leverages an individual user’s chronological editing logs to condition and guide future image generations. The second, Persona-Based Conditioning, addresses the same personalization objective through pre-defined professional identities (e.g., “Wildlife Photographer”) and their associated editing histories. To this end, we construct a synthetically generated dataset in which edits are systematically produced to align with the attributes and stylistic tendencies of specific personas. We benchmark state-of-the-art image editing models on both tasks using single-shot prompting and iterative prompt refinement strategies that explicitly incorporate editing history. Across a diverse set of experiments, we demonstrate that current models remain brittle when editing history is provided alongside the target instruction, frequently failing to faithfully express the stylistic attributes required for effective personalization.

Paper Type: Long

Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond

Research Area Keywords: benchmark, image editing dataset, evaluation

Contribution Types: Data resources, Data analysis

Languages Studied: N/A

Submission Number: 6282

Loading