MaskGT: Learning Task-Adaptive Connectivity in Graph Transformers

MaskGT: Learning Task-Adaptive Connectivity in Graph Transformers

04 Mar 2026 (modified: 08 Mar 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Graph Transformers (GTs) enable all-to-all interactions, but the optimal connectivity is task-dependent: some problems favor sparse, topology-aligned message passing, while others need global attention. We propose MaskGT, a GT-agnostic module that learns a discrete sparse gate over attention edges. By learning which node pairs may communicate within self-attention, MaskGT injects a task-adaptive relational inductive bias without fully committing to the input adjacency. Across synthetic and real-world benchmarks, MaskGT improves performance and robustness by suppressing spurious interactions under structural noise, and enables parameter-efficient multi-task and transfer by localizing task-specific structure in the mask while reusing a shared backbone. These results position MaskGT as a step toward more general-purpose graph models.

Submission Type: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Renjie_Liao1

Submission Number: 7757

Loading