Keywords: agentic memory, token compression, visual memory representation, efficient retrieval
Abstract: Agentic systems built upon large language models (LLMs) increasingly depend on long-context modeling to support document understanding, long-term memory recall, and multi-step reasoning. However, extending context windows incurs substantial computational and memory overhead, significantly limiting the scalability and practicality of long-context LLM-based agents. Recent studies suggest that visual representations can serve as an effective medium for compressing and organizing long textual content. Motivated by this insight, we propose VizoMem, a novel visual memory framework for agentic systems. In this framework, textual memories are pre-rendered into structured images and stored as visual notes, enabling compact and persistent memory representations. Moving beyond standard vision-language models like Glyph, we pioneer a specialized retrieval system designed for large-scale visual memory. Our innovation lies in the construction of a dedicated dataset and the development of a highly efficient retrieval model that repurposes foundational vision-language encoders to navigate complex, text-heavy visual environments. Experiments on public datasets demonstrate that our approach significantly reduces token consumption while preserving effective long-term memory recall, highlighting its potential as a scalable alternative to conventional long-context modeling.
Paper Type: Long
Research Area: Retrieval-Augmented Language Models
Research Area Keywords: AI / LLM Agents,Efficient/Low-Resource Methods for NLP
Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency
Languages Studied: English
Submission Number: 7672
Loading