Abstract: Graph Neural Networks have demonstrated great success in various fields of multimedia. However, the distribution shift between the training and test data challenges the effectiveness of GNNs. To mitigate this challenge, Test-Time Training (TTT) has been proposed as a promising approach. Traditional TTT methods require a demanding unsupervised training strategy to capture the information from test to benefit the main task. Inspired by the great annotation ability of Large Language Models (LLMs) on Text-Attributed Graphs (TAGs), we propose to enhance the test-time training on graphs with LLMs as annotators. In this paper, we design a novel Test-Time Training pipeline, LLMTTT, which conducts the test-time adaptation under the annotations by LLMs on a carefully-selected node set. Specifically, LLMTTT introduces a hybrid active node selection strategy that considers not only node diversity and representativeness, but also prediction signals from the pre-trained model. Given annotations from LLMs, a two-stage training strategy is designed to tailor the test-time model with the limited and noisy labels.
A theoretical analysis ensures the validity of our method and extensive experiments demonstrate that the proposed LLMTTT can achieve a significant performance improvement compared to existing Out-of-Distribution (OOD) generalization methods.
Primary Subject Area: [Content] Multimodal Fusion
Secondary Subject Area: [Generation] Multimedia Foundation Models
Relevance To Conference: (1) Graph neural networks (GNNs) are increasingly being used to learn representations of multimedia data, where nodes represent different modalities (such as images, text, and audio) and edges capture relationships between them. By improving the effectiveness of test-time training for GNNs, LLMTTT can lead to better representations of multimodal data within graph structures, enhancing the understanding of relationships between different modalities. (2) Large Language Models (LLMs) are powerful tools for understanding textual data and generating annotations. By incorporating annotations from LLMs into the test-time training process, LLMTTT can facilitate the integration of textual information with other modalities in multimedia data. This can be particularly beneficial in scenarios where textual descriptions accompany multimedia content, such as paper citation networks or social networks. (3) Distribution shift is a common challenge in multimodal processing, where the distribution of training data may differ from that of test data. By addressing distribution shift through effective test-time training with LLM annotations, LLMTTT can improve the robustness of multimodal processing models to variations in data distribution, leading to more reliable performance in real-world applications.
Submission Number: 669
Loading