Decoder-as-Policy: Head-Only PPO Fine-Tuning of a Spike-Transformer for Low-Error Kinematic Decoding

Published: 23 Sept 2025, Last Modified: 18 Oct 2025NeurIPS 2025 Workshop BrainBodyFMEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Brain–Computer Interface (BCI); Neural decoding; POYO (PerceiverIO transformer); Proximal Policy Optimization (PPO); Behavior cloning; Kinematic/velocity reconstruction; Uncertainty calibration; Closed-loop control; Neural Latents Benchmark ;
Abstract: Spike-token transformers such as POYO achieve strong across-session decoding, yet purely supervised training can overweight variance alignment (explained variance) relative to the pointwise accuracy needed for closed-loop BCI control. We treat the decoder’s velocity head as a Gaussian policy and fine-tune it head-only: a behavior-cloning (BC) warm start followed by on-policy PPO on a control-aligned reward (negative MSE plus a small entropy bonus and an optional variance-calibration term), while keeping the POYO encoder frozen. On \textit{NLB'21 mc\_maze\_medium}, extended BC (1--2k steps) followed by PPO reveals a broad Pareto window with very low error and high explained variance, best $R^2=0.9975$ and MSE$=0.0023$ at the same validation checkpoint (step~1900), with predictive scale ($\sigma\approx0.993$). On a separate Perich\_Miller dataset trained with 400 step, the POYO+ achieved $R^2\approx0.87$ (MSE$\approx0.34$) after PPO fine tuning. We provide leakage safeguards, ablations, and reproducible configs.
Submission Number: 84
Loading