The Third Edition of Large Vision – Language Model  Learning and Applications Grand Challenge (LAVA Challenge)

MinhDuc Vo; Akihiro Sugimoto; Hideki Nakayama; KHAN MD ANWARUS SALAM; Takara Taniguchi; Daichi Sato; Kaito Baba; Duc-Tuan Luu

The Third Edition of Large Vision – Language Model Learning and Applications Grand Challenge (LAVA Challenge)

MinhDuc Vo, Akihiro Sugimoto, Hideki Nakayama, KHAN MD ANWARUS SALAM, Takara Taniguchi, Daichi Sato, Kaito Baba, Duc-Tuan Luu

Published: 03 Apr 2026, Last Modified: 03 Apr 2026ACMMM2026-MGC-ProposalEveryoneRevisionsCC BY 4.0

Keywords: vision and language, large vision - language models, question - answering task, document understanding

TL;DR: How can LVLMs understand multi-page documentations and slides?

Abstract: Recent advances in Large Vision-Language Models (LVLMs) hold immense promise across various domains, including healthcare, education, entertainment, transportation, and finance, by enabling more sophisticated and context-aware multimedia interactions. Indeed, the outcomes of our previous challenges, held in conjunction with the Asian Conference on Computer Vision (ACCV) 2024 in Hanoi, Vietnam, and the ACM International Conference on Multimedia (ACMMM) 2025 in Dublin, Ireland (https://lava-workshop.github.io), underscored the limitations of LVLMs in processing such documentations and presentations. To enhance the capability of LVLMs to accurately interpret and generate descriptive text from complex visual inputs within multi-page business-related documents and slides, we continue to organize the Large Vision Language Model Learning and Applications (LAVA) Challenge, building upon the success of previous years. LAVA Challenge focuses on question-answering tasks, including both multiple-choice and open-ended questions, based on multi-page documentations and slides, including diverse data representations such as Graphs, Charts, Tables, Diagrams, Data Flow Diagrams (DFDs), Class Diagrams, Gantt Charts, and Building Design Drawings.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 14

Loading