Think Once, Reuse Smartly: Bio-Inspired Memory for Efficient Vision-Language Reasoning in Autonomous Driving
Keywords: Vision-Language Models, Autonomous Driving, Memory-Driven Inference
Abstract: Vision-Language Models (VLMs) are increasingly vital for robust decision-making in autonomous driving, yet their deep reasoning creates a critical latency bottleneck, making them impractical for real-world deployment. Current approaches accelerate inference by pruning input data, but they overlook the primary source of inefficiency: the constant, wasteful re-computation of reasoning that remains valid across consecutive frames.
We introduce MEMO-VLM, a memory-driven framework inspired by human cognition that eliminates this redundant reasoning.
Instead of re-generating its entire reasoning, MEMO-VLM treats previous conclusions as a hypothesis to be validated against new visual evidence, intelligently reusing what remains true and surgically updating only what has changed. This is achieved with a plug-and-play, two-stage approach that requires no VLM retraining, making it a broadly applicable solution. Experiments demonstrate that MEMO-VLM accelerates inference by up to 4.3$\times$.
By bridging bio-inspired memory with computational efficiency, our work offers a practical path to deploying the advanced reasoning of VLMs in safety-critical autonomous systems.
Paper Type: Long
Research Area: LLM Efficiency
Research Area Keywords: LLM Efficiency, NLP in resource-constrained settings
Contribution Types: Approaches low compute settings-efficiency
Languages Studied: English
Submission Number: 959
Loading