FICO-BENCH: Evaluating Vision-Language Models under Visual Fidelity and Compression at Scale

Published: 01 Mar 2026, Last Modified: 24 Apr 2026ICLR 2026 AIWILDEveryoneRevisionsCC BY 4.0
Keywords: Vision Language Model; Optical Context Compression; Efficiency; Large Language Model; Long Context
TL;DR: We introduce FiCo-BENCH and show that compression ratio significantly affect performances in the visual text compression setting and current vision language models show varied robustness and proficiency.
Abstract: Visual text compression is an emerging paradigm for rendering text as images for processing by vision-language models (VLMs), enabling higher information density per context token. However, the robustness of VLMs under dense, text-based visual inputs remains unevaluated. We introduce FiCo-BENCH, a benchmark designed to assess VLM robustness across seven controlled variants of visual fidelity and information density. FiCo-BENCH spans documents of 8k to 64k tokens and includes three tasks of increasing semantic granularity: optical character recognition (OCR), needle-in-a-haystack (NIAH) retrieval, and visual question answering (VQA). Evaluating 11 general-purpose VLMs and 3 OCR-specialized models reveals three consistent trends: performance drops sharply under increased density or reduced resolution; cross-task transfer between OCR, NIAH, and VQA is limited; and VQA is comparatively robust because low-level details are lost before high-level semantics. By exposing failure modes that remain invisible under conventional VLM evaluations, \method\ establishes a rigorous test-bed for visual text compression.
PDF: pdf
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 166
Loading