Seeking and Updating with Live Visual Knowledge

Mingyang Fu; Yuyang Peng; Dongping Chen; Zetong Zhou; Benlin Liu; Yao Wan; Zhou Zhao; Philip S. Yu; Ranjay Krishna

Seeking and Updating with Live Visual Knowledge

Mingyang Fu, Yuyang Peng, Dongping Chen, Zetong Zhou, Benlin Liu, Yao Wan, Zhou Zhao, Philip S. Yu, Ranjay Krishna

Published: 18 Sept 2025, Last Modified: 30 Oct 2025NeurIPS 2025 Datasets and Benchmarks Track posterEveryoneRevisionsBibTeXCC BY-NC 4.0

Keywords: Dataset, Synthetic Data, Visual Knowledge, VQA, Multimodal LLM, RAG, Knowledge Update

TL;DR: We introduce a visual knowledge dataset aiming to evaluate current models' latest visual knowledge seeking and updating capabilities.

Abstract: The visual world around us constantly evolves, from real-time news and social media trends to global infrastructure changes visible through satellite imagery and augmented reality enhancements. However, Multimodal Large Language Models (MLLMs), which automate many tasks, struggle to stay current, limited by the cutoff dates in their fixed training datasets. To quantify this stagnation, we introduce LiveVQA, the first-of-its-kind dataset featuring 107,143 samples and 12 categories data specifically designed to support research in both seeking and updating with live visual knowledge. Drawing from recent news articles, video platforms, and academic publications in April 2024-May 2025, LiveVQA enables evaluation of how models handle latest visual information beyond their knowledge boundaries and how current methods help to update them. Our comprehensive benchmarking of 17 state-of-the-art MLLMs reveals significant performance gaps on content beyond knowledge cutoff, and tool-use or agentic visual seeking framework drastically gain an average of 327% improvement. Furthermore, we explore parameter-efficient fine-tuning methods to update MLLMs with new visual knowledge. We dive deeply to the critical balance between adapter capacity and model capability when updating MLLMs with new visual knowledge. All the experimental dataset and source code are publicly available at: https://livevqa.github.io.

Croissant File: json

Dataset URL: https://huggingface.co/datasets/ONE-Lab/LiveVQA-new

Code URL: https://github.com/fumingyang2004/LIVEVQA

Supplementary Material: zip

Primary Area: Datasets & Benchmarks for applications in language modeling and vision language modeling

Submission Number: 1549

Loading