LocalFormer: Mitigating Over-Globalising in Transformers on Graphs with Localised Training

Naganand Yadati

LocalFormer: Mitigating Over-Globalising in Transformers on Graphs with Localised Training

Naganand Yadati

Published: 07 May 2025, Last Modified: 07 May 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: As Transformers become more popular for graph machine learning, a significant issue has recently been observed. Their global attention mechanisms tend to overemphasize distant vertices, leading to the phenomenon of ``over-globalising.'' This phenomenon often results in the dilution of essential local information, particularly in graphs where local neighbourhoods carry significant predictive power. Existing methods often struggle with rigidity in their local processing, where tightly coupled operations limit flexibility and adaptability in diverse graph structures. Additionally, these methods can overlook critical structural nuances, resulting in an incomplete integration of local and global contexts. This paper addresses these issues by proposing LocalFormer, a novel framework, to effectively localise a transformer model by integrating a distinct local module and a complementary module that integrates global information. The local module focuses on capturing and preserving fine-grained, neighbourhood-specific patterns, ensuring that the model maintains sensitivity to critical local structures. In contrast, the complementary module dynamically integrates broader context without overshadowing the localised information, offering a balanced approach to feature aggregation across different scales of the graph. Through collaborative and warm-up training strategies, these modules work synergistically to mitigate the adverse effects of over-globalising, leading to improved empirical performance. Our experimental results demonstrate the effectiveness of LocalFormer compared to state-of-the-art baselines on vertex-classification tasks.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: All suggested changes have been made, including rigorous clarification of mutual supervision in collaborative loss functions, expanded experiments with additional datasets, and in-depth discussions on counter-intuitive homophily observations. Notably, the updated draft positions the contributions more clearly in relation to prior work, justifies the selection of NodeFormer for visualizing attention trends, and validates module choices in the proposed framework through linked experimental evidence. A motivation section has also been added at the beginning of Section 3.2, right before introducing the formal definition of the homophily metric. For more details, including specific page numbers, please refer to the global response in the thread titled 'Summary of all Changes Made to the Paper.'

Assigned Action Editor: ~Shuiwang_Ji1

Submission Number: 3743

Loading