Abstract: Although Multimodal Large Language Models (MLLMs) have achieved remarkable performance in various complex tasks, they still face challenges in understanding Knowledge Graphs (KGs), which are typical graphs with structured semantics. In this paper, we conduct a comprehensive evaluation to assess the capability of MLLMs in this aspect and investigate key factors influencing their performance in understanding and reasoning over KGs across different dimensions, with a particular focus on factors related to the triple recognition of KGs. Our study yields several key findings and insights that contribute to advancing this research domain.
We find that MLLMs indeed have limitations in understanding complicated KGs, which is primarily attributed to the poor recognition ability of textual triples in KGs, particularly for graphs with special layouts or high density. On this basis, we propose a fine-tuning method to enhance the understanding capabilities of MLLMs on KGs, achieving an accuracy increase of 7.3\% compared to baseline model.
Paper Type: Long
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Research Area Keywords: Evaluation of Knowledge Graph Understanding, Multimodal Large Lauguage Model
Contribution Types: Model analysis & interpretability, Data analysis
Languages Studied: English
Submission Number: 2036
Loading