MGA-VQA: Secure and Interpretable Graph-Augmented Visual Question Answering with Memory-Guided Protection Against Unauthorized Knowledge Use

Ahmad Mohammadshirazi; Pinaki Prasad Guha Neogi; Dheeraj Kulshrestha; Rajiv Ramnath

MGA-VQA: Secure and Interpretable Graph-Augmented Visual Question Answering with Memory-Guided Protection Against Unauthorized Knowledge Use

Ahmad Mohammadshirazi, Pinaki Prasad Guha Neogi, Dheeraj Kulshrestha, Rajiv Ramnath

Published: 27 Oct 2025, Last Modified: 27 Oct 2025NeurIPS Lock-LLM Workshop 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Document Visual Question Answering, Multi-modal Learning, Interpretable AI, Graph-based Reasoning, Question-guided Compression

Abstract: Vision-language models in document processing face growing risks of unauthorized knowledge extraction, distillation, and malicious repurposing. Existing DocVQA systems rely on opaque reasoning, leaving them vulnerable to exploitation. We propose MGA-VQA, a security-aware multi-modal framework that integrates token-level encoding, spatial graph reasoning, memory-augmented inference, and question-guided compression into an auditable architecture. Unlike prior black-box models, MGA-VQA introduces interpretable graph-based decision pathways and controlled memory access, making knowledge extraction traceable and resistant to unauthorized distillation or compression. Evaluation across six benchmarks (FUNSD, CORD, SROIE, DocVQA, STE-VQA, and RICO) demonstrates not only superior accuracy and efficiency, but also enhanced protection properties that align with the goals of preventing model misuse. MGA-VQA bridges document understanding with LLM security, showing how architectural interpretability can safeguard against unauthorized knowledge use. The coding implementation can be found in: https://github.com/ahmad-shirazi/MGAVQA

Submission Number: 62

Loading