Keywords: LLP, GNN
Abstract: Learning from Label Proportion (LLP) is a weakly supervised learning paradigm in which only aggregated label proportions over collections of instances (i.e., bags) are provided, rather than individual labels. This allows classification while preserving privacy or reducing annotation costs. Existing LLP methods, however, have been largely restricted to i.i.d. tabular or image data. No solution currently addresses graphs, where instances are inherently interdependent through network structure. In this paper, we generalize LLP to the graph domain and study the problem of node classification with label proportions, where only distributional supervision is available for node bags, and the goal is to infer labels for all nodes in the graph. We argue that the lack of node-level supervision is the main challenge for LLP on graphs, and that existing methods based on i.i.d. assumptions fail to exploit topological correlations. To overcome this, we propose GLLP(Graph Learning from Label Proportions), a framework that leverages Optimal Transport (OT) with a homophily-aware cost to generate soft pseudo-labels for individual nodes. These pseudo-labels provide stronger supervision signals for training Graph Neural Networks. We further establish theoretical guarantees showing the alignment of our cost function with the node classification objective. Extensive experiments on six homophilic graph benchmarks demonstrate that GLLP consistently outperforms existing LLP baselines and variants. Code and benchmark datasets will be released for public access.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 21510
Loading