Personalization Toolkit: Training Free Personalization of Large Vision Language Models

Personalization Toolkit: Training Free Personalization of Large Vision Language Models

TMLR Paper6153 Authors

09 Oct 2025 (modified: 01 Feb 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Personalization of Large Vision-Language Models (LVLMs) involves customizing models to recognize specific users or object instances and to generate contextually tailored responses. Existing approaches rely on time-consuming training for each item, making them impractical for real-world deployment, as reflected in current personalization benchmarks limited to object-centric single-concept evaluations. In this paper, we present a novel training-free approach to LVLM personalization called PeKit. We introduce a comprehensive, real-world benchmark designed to rigorously evaluate various aspects of the personalization task. PeKit leverages pre-trained vision foundation models to extract distinctive features, applies retrieval-augmented generation (RAG) techniques to identify instances within visual inputs, and employs visual prompting strategies to guide model outputs. Our model-agnostic vision toolkit enables efficient and flexible multi-concept personalization across both images and videos, without any additional training. We achieve state-of-the-art results, surpassing existing training-based methods.

Submission Length: Regular submission (no more than 12 pages of main content)

Previous TMLR Submission Url: https://openreview.net/forum?id=I3jOEvYoFZ

Changes Since Last Submission: We have revised the manuscript to address the reviewers’ comments. All modified content is highlighted in blue in the revised manuscript.

Assigned Action Editor: ~Soma_Biswas1

Submission Number: 6153

Loading