LOMAC: GNN-based Deep Reinforcement Learning with One-Way Markov Chain for Graph Coloring

Xiaolong Li; ZhaoXingZou; KeHuang; Yang Liu; Changyan Yi; Jun Cai; Zhen Zhang

LOMAC: GNN-based Deep Reinforcement Learning with One-Way Markov Chain for Graph Coloring

Xiaolong Li, ZhaoXingZou, KeHuang, Yang Liu, Changyan Yi, Jun Cai, Zhen Zhang

18 Sept 2025 (modified: 10 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Deep Reinforcement Learning, Graph Coloring Problem, Graph Neural Network, One-way Markov Chain

TL;DR: A novel GNN-based DRL framework that integrates a one-way, two-dimensional Markov chain and a linear-complexity dynamic message-passing GNN model for efficient graph coloring.

Abstract: The graph coloring problem (GCP) is an NP-hard combinatorial optimization task aimed at assigning the minimum number of colors to graph vertices such that no two adjacent vertices share the same color. While deep reinforcement learning (DRL) and graph neural networks (GNNs) are promising approaches to solving the GCP, their scalability is usually limited by the large number of Markov states and high computational complexity as the graph size increases. In this paper, we introduce LOMAC, a novel GNN-based DRL framework that integrates a one-way, two-dimensional Markov chain and a linear-complexity GNN model with pseudonode-enhanced message passing. This integration significantly reduces both space and computational complexity. We transform the GCP into a one-way Markov chain model, introducing two key concepts: Markov state potential and graph state potential. Through theoretical analysis of Markov- and graph-state potentials, we effectively guide the search for an optimal vertex-coloring solution. We show that LOMAC reduces the number of Markov states from $O(K^N)$ to $O(NK)$, simplifying decision-making with unidirectional state transitions. Additionally, an invalid action penalty mechanism is implemented to further optimize the coloring process. Experimental results in various sizes of \textit{Erdős–Rényi}- and \textit{Barabási–Albert} graphs and 16 real-world benchmarks demonstrate that LOMAC achieves state-of-the-art performance in the number of required colors.

Supplementary Material: zip

Primary Area: optimization

Submission Number: 10827

Loading