A Novel Reward-driven Metropolis-Hastings Sampling Method for Dynamic Agent Control via Spiking Neural Networks
Keywords: Spiking Neural Networks, Metropolis-Hastings sampling, Dynamical Agent, Control, Reinforcement Learning.
Abstract: Spiking Neural Networks (SNNs) promise low-
power, real-time control but their training can be challenging for
reinforcement learning (RL) tasks, due to the non-differentiability
of spikes and the hardware non-idealities of emerging analog
and mixed-signal neuromorphic processors. Whithin this context,
we present a gradient-free framework for training SNN policies
with Metropolis-Hastings (MH) sampling, using episode returns
as a reward-driven pseudo-likelihoods to propose, accept or
reject network parameter updates. In evaluations on standard
RL benchmarks (AcroBot and CartPole), our single-layer SNN
policies outperform deep Q-learning (DQL) baselines while using
fewer neurons and training episodes: Acrobot reaches −90 vs.
−140 (for DQL), and CartPole achieves the maximum score
of 500 vs. 280, plateauing within ≈ 50 episodes. These results
highlight the effectiveness of MH-trained SNNs for control and
their suitability for neuromorphic deployment.
Submission Type: Novel research
Student Paper: No
Demo Or Video: No
Public Extended Abstract: Yes
Submission Number: 2
Loading