Keywords: Multi-Agent Reinforcement Learning, Spectral-Topology, Koopman, Vehicular-to-Vehicular (V2V) communication, 5G
Abstract: Vehicular-to-Vehicular (V2V) communication is a cornerstone for enabling cooperative safety, real-time traffic management, and autonomous driving. A key enabler in the 5G NR standard is sidelink mode-2, where vehicles autonomously select transmission resources without centralized scheduling. While this approach ensures scalability, its baseline mechanism---Semi-Persistent Scheduling (SPS)---underperforms in dense or highly mobile environments. SPS suffers from slow reselection, hidden-terminal collisions, and lacks adaptability to diverse QoS requirements such as latency, reliability, and throughput. These shortcomings compromise safety-critical applications where packet reception ratio (PRR), low delay, and high reliability are crucial.
To address these limitations, we propose a \textbf{Koopman-augmented Graph Multi-Agent Reinforcement Learning (KG-MARL)} framework for decentralized V2V sidelink resource allocation. Unlike SPS, KG-MARL empowers each vehicular link to dynamically select both its subchannel and transmit power using a richer representation of the environment. The framework combines: (i) \emph{spectrogram-based spectral maps} via short-time Fourier transform (STFT), capturing temporal and frequency-domain interference dynamics; (ii) \emph{Graph Attention Network (GAT) embeddings}, modeling the interference topology among neighboring links; and (iii) \emph{Koopman operator-based prediction}, which linearizes nonlinear state dynamics to enable stable and sample-efficient prediction of short-horizon interference evolution.
Each agent optimizes a reward shaped as a potential game, aligning local and global objectives. The per-link reward (utility) is The per-link utility is
$$R_i=U_i = \alpha\,\mathrm{PRR}_i + \beta \log(1+\mathrm{SINR}_i)
- \gamma\,\mathrm{Int}_i - \lambda P_i,
\quad
\mathrm{SINR}_i = \frac{P_i g_{ii}}{N_0 + \sum_{j\neq i,\,r_j=r_i} P_j g_{ji}},$$
where $\mathrm{PRR}_i$ is the packet reception ratio, $\mathrm{Int}_i$ the measured interference, $P_i$ the transmit power, $g_{ii}$ and $g_{ji}$ the desired and interfering channel gains, $N_0$ the noise power, and $\alpha,\beta,\gamma,\lambda$ weighting factors for reliability, spectral efficiency, interference mitigation, and power cost.
The framework follows a Soft Actor--Critic (SAC)-style actor--critic architecture with centralized training and decentralized execution. Koopman operators accelerate value updates by approximating state transitions linearly, while GAT embeddings enhance coordination via graph-structured observations. Once trained, vehicles execute decisions autonomously with minimal overhead, ensuring practicality for real deployment.
Submission Number: 271
Loading