MGA-VQA: Secure and Interpretable Graph-Augmented Visual Question Answering with Memory-Guided Protection Against Unauthorized Knowledge Use

Published: 27 Oct 2025, Last Modified: 27 Oct 2025NeurIPS Lock-LLM Workshop 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Document Visual Question Answering, Multi-modal Learning, Interpretable AI, Graph-based Reasoning, Question-guided Compression
Abstract: Vision-language models in document processing face growing risks of unauthorized knowledge extraction, distillation, and malicious repurposing. Existing DocVQA systems rely on opaque reasoning, leaving them vulnerable to exploitation. We propose MGA-VQA, a security-aware multi-modal framework that integrates token-level encoding, spatial graph reasoning, memory-augmented inference, and question-guided compression into an auditable architecture. Unlike prior black-box models, MGA-VQA introduces interpretable graph-based decision pathways and controlled memory access, making knowledge extraction traceable and resistant to unauthorized distillation or compression. Evaluation across six benchmarks (FUNSD, CORD, SROIE, DocVQA, STE-VQA, and RICO) demonstrates not only superior accuracy and efficiency, but also enhanced protection properties that align with the goals of preventing model misuse. MGA-VQA bridges document understanding with LLM security, showing how architectural interpretability can safeguard against unauthorized knowledge use. The coding implementation can be found in: https://github.com/ahmad-shirazi/MGAVQA
Submission Number: 62
Loading