TL;DR: We release a testbed for LLM-based node classification algorithms, and provide 8 novel takeaways based on extensive experiments on this testbed.
Abstract: Node classification is a fundamental task in graph analysis, with broad applications across various fields. Recent breakthroughs in Large Language Models (LLMs) have enabled LLM-based approaches for this task. Although many studies demonstrate the impressive performance of LLM-based methods, the lack of clear design guidelines may hinder their practical application. In this work, we aim to establish such guidelines through a fair and systematic comparison of these algorithms. As a first step, we developed LLMNodeBed, a comprehensive codebase and testbed for node classification using LLMs. It includes 10 homophilic datasets, 4 heterophilic datasets, 8 LLM-based algorithms, 8 classic baselines, and 3 learning paradigms. Subsequently, we conducted extensive experiments, training and evaluating over 2,700 models, to determine the key settings (e.g., learning paradigms and homophily) and components (e.g., model size and prompt) that affect performance. Our findings uncover 8 insights, e.g., (1) LLM-based methods can significantly outperform traditional methods in a semi-supervised setting, while the advantage is marginal in a supervised setting; (2) Graph Foundation Models can beat open-source LLMs but still fall short of strong LLMs like GPT-4o in a zero-shot setting. We hope that the release of LLMNodeBed, along with our insights, will facilitate reproducible research and inspire future studies in this field. Codes and datasets are released at https://llmnodebed.github.io/.
Lay Summary: With the rapid advancements in large language models (LLMs), we wanted to explore their potential and benefits for the task of node classification, a key problem in machine learning on graphs. To do this, we developed LLMNodeBed, a comprehensive codebase and testbed designed to evaluate LLMs for node classification.
LLMNodeBed includes 14 datasets, 8 LLM-based algorithms, 8 classic baselines, and 3 learning paradigms. Using this testbed, we trained and evaluated over 2,700 models to understand the impact of factors such as learning paradigms, graph homophily, language model type and size, and prompt design on performance.
Our findings provide 8 novel insights, along with intuitive explanations and practical guidelines for applying LLM-based algorithms in real-world settings. This work offers a valuable resource for researchers and practitioners aiming to leverage LLMs for graph-related tasks in diverse applications.
Link To Code: https://github.com/WxxShirley/LLMNodeBed
Primary Area: Deep Learning->Graph Neural Networks
Keywords: Large Language Models, Graph Neural Networks, Node Classification
Submission Number: 2437
Loading