Thinking is Seeing: Multi-modal Large Language Models are Exceptional in Understanding Knowledge Graphs

Lingbing Guo; Yichi Zhang; shaolin Zhu; Zhuo Chen; Shaokai Chen

Thinking is Seeing: Multi-modal Large Language Models are Exceptional in Understanding Knowledge Graphs

Lingbing Guo, Yichi Zhang, shaolin Zhu, Zhuo Chen, Shaokai Chen

18 Sept 2025 (modified: 14 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: knowledge graph, larege language model, multi-modal large language model, knowledge graph completion

Abstract: The representation learning of knowledge graphs (KGs) is a longstanding research problem. While graph neural networks (GNNs) have driven recent progress, they still struggle with encoding textual features and subtle relationships of KGs, particularly in conveying key information to large language models (LLMs). The emergence of multi-modal LLMs (MLLMs), which combine linguistic and visual understanding, presents an intriguing opportunity: Could their vision capabilities inspire mental visualization, facilitating conceptual thinking and abstract reasoning akin to human cognition? To investigate this premise, we propose SeeKG , an innovative framework that transforms KGs into visually rendered representations as image inputs for MLLMs. We evaluate SeeKG under both training-free and supervised fine-tuning settings, where the experimental results show that SeeKG excels in understanding KG sub-graphs and achieves competitive performance even without training or demonstrations. Further fine-tuning on small-batch data reveals that it outperforms state-of-the-art LLM-based KG completion methods by substantial margins across multiple benchmark datasets.

Supplementary Material: zip

Primary Area: learning on graphs and other geometries & topologies

Submission Number: 11013

Loading