MMPB: It’s Time for Multi-Modal Personalization

Jaeik Kim; Woojin Kim; Woohyeon Park; Jaeyoung Do

MMPB: It’s Time for Multi-Modal Personalization

Jaeik Kim, Woojin Kim, Woohyeon Park, Jaeyoung Do

Published: 18 Sept 2025, Last Modified: 17 Jan 2026NeurIPS 2025 Datasets and Benchmarks Track posterEveryoneRevisionsBibTeXCC BY-NC-SA 4.0

Keywords: Multi-Modal Models, Vision Language Models, Personalization, Visual Question Answering

TL;DR: We propose MMPB, the first benchmark for evaluating personalization in large vision–language models.

Abstract: Visual personalization is essential in user-facing AI systems such as smart homes and healthcare, where aligning model behavior with user-centric concepts is critical. However, recent large Vision-Language Models (VLMs), despite their broad applicability, remain underexplored in their ability to adapt to individual users. In this paper, we introduce MMPB, the first extensive benchmark for evaluating VLMs on personalization. MMPB comprises 10k image-query pairs and includes 111 personalizable concepts across four categories: humans, animals, objects, and characters, with the human category enriched with preference-grounded queries. We structure personalization into three main task types, each highlighting a different key property of VLMs. Using 23 widely used VLMs including both open- and closed-source models, we evaluate personalization performance via a three-stage protocol: concept injection, multi-turn dialogue, and personalized querying. Our findings indicate that most VLMs (including some closed-source models) struggle with personalization, particularly in maintaining consistency over dialogue, handling user preferences, and adapting to visual cues. Our analysis reveals that the challenges in VLM personalization (such as refusal behaviors and long-context forgetting) highlight substantial room for improvement. By identifying these limitations and offering a scalable benchmark, MMPB offers valuable insights and a solid foundation for future research toward truly personalized multi-modal AI.

Croissant File: json

Dataset URL: https://huggingface.co/datasets/stackadd/MMPB

Code URL: https://github.com/MMPB-Benchmark/MMPB

Primary Area: Social and economic aspects of datasets and benchmarks in machine learning (e.g., fairness, interpretability, human-AI interaction, privacy, safety, strategic behavior)

Submission Number: 1249

Loading