Decoder-as-Policy: Head-Only PPO Fine-Tuning of a Spike-Transformer for Low-Error Kinematic Decoding
Keywords: Brain–Computer Interface (BCI); Neural decoding; POYO (PerceiverIO transformer); Proximal Policy Optimization (PPO); Behavior cloning; Kinematic/velocity reconstruction; Uncertainty calibration; Closed-loop control; Neural Latents Benchmark ;
Abstract: Spike-token transformers such as POYO achieve strong across-session decoding, yet purely supervised
training can overweight variance alignment (explained variance) relative to the pointwise accuracy
needed for closed-loop BCI control. We treat the decoder’s velocity head as a Gaussian policy and
fine-tune it head-only: a behavior-cloning (BC) warm start followed by on-policy PPO on a
control-aligned reward (negative MSE plus a small entropy bonus and an optional
variance-calibration term), while keeping the POYO encoder frozen.
On \textit{NLB'21 mc\_maze\_medium}, extended BC (1--2k steps) followed by PPO reveals a broad Pareto
window with very low error and high explained variance, best $R^2=0.9975$ and MSE$=0.0023$ at the same validation checkpoint (step~1900), with
predictive scale ($\sigma\approx0.993$).
On a separate Perich\_Miller dataset trained with 400 step, the POYO+ achieved
$R^2\approx0.87$ (MSE$\approx0.34$) after PPO fine tuning.
We provide leakage safeguards, ablations, and reproducible configs.
Submission Number: 84
Loading