Keywords: Graph Transferring Learning, Quantization Tokenizer
Abstract: Graph tokenization aims to convert graph-structured data into discrete representations that can be used in foundation models. Recent methods propose to use vector quantization to map nodes or subgraphs into discrete token IDs. However, it remains unclear whether these quantized tokenizers truly capture high-level, transferable graph patterns across diverse domains. In this work, we conduct a comprehensive empirical study to analyze the representational consistency of quantized graph tokens across different datasets. We introduce the Token Information Discrepancy Score (TIDS) to quantify the alignment of structural and feature information between source and target graphs for each token. Our results reveal that current graph quantized tokenizers often assign the same token to structurally inconsistent patterns across graphs, resulting in high TIDS and degraded transfer performance. We further demonstrate that TIDS is positively correlated with the generalization gap in downstream tasks. Finally, we propose a simple yet effective structural hard encoding (SHE) strategy to enhance the structural awareness of the tokenizer. SHE leads to lower TIDS and improved transferability, highlighting the importance of explicitly encoding transferable graph structure in token design.
Primary Area: learning on graphs and other geometries & topologies
Submission Number: 18812
Loading