A Novel Reward-driven Metropolis-Hastings Sampling Method for Dynamic Agent Control via Spiking Neural Networks

Published: 18 Sept 2025, Last Modified: 18 Oct 2025EdgeAI4R PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Spiking Neural Networks, Metropolis-Hastings sampling, Dynamical Agent, Control, Reinforcement Learning.
Abstract: Spiking Neural Networks (SNNs) promise low- power, real-time control but their training can be challenging for reinforcement learning (RL) tasks, due to the non-differentiability of spikes and the hardware non-idealities of emerging analog and mixed-signal neuromorphic processors. Whithin this context, we present a gradient-free framework for training SNN policies with Metropolis-Hastings (MH) sampling, using episode returns as a reward-driven pseudo-likelihoods to propose, accept or reject network parameter updates. In evaluations on standard RL benchmarks (AcroBot and CartPole), our single-layer SNN policies outperform deep Q-learning (DQL) baselines while using fewer neurons and training episodes: Acrobot reaches −90 vs. −140 (for DQL), and CartPole achieves the maximum score of 500 vs. 280, plateauing within ≈ 50 episodes. These results highlight the effectiveness of MH-trained SNNs for control and their suitability for neuromorphic deployment.
Submission Type: Novel research
Student Paper: No
Demo Or Video: No
Public Extended Abstract: Yes
Submission Number: 2
Loading