BrainDistill: Implantable Motor Decoding with Task-Specific Knowledge Distillation

BrainDistill: Implantable Motor Decoding with Task-Specific Knowledge Distillation

ICLR 2026 Conference Submission20939 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Motor Decoding; Knowledge Distillation; Quantization Aware Training; Brain Computer Interface

TL;DR: A novel motor decoding pipeline that integrates task-specific knowledge distillation and efficient implantable neural decoder

Abstract: Transformer-based neural decoders with large parameter counts, pre-trained on large-scale datasets, have recently outperformed classical machine learning models and small neural networks on brain–computer interface (BCI) tasks. However, their large parameter counts and high computational demands hinder deployment in power-constrained implantable systems. To address this challenge, we introduce $\textbf{BrainDistill}$, a novel implantable motor decoding pipeline that integrates a neural decoder with a distillation framework. First, we propose $\textbf{TSKD}$, a task-specific knowledge distillation method that projects task-relevant teacher embeddings into compact student models. Unlike standard feature distillation methods that attempt to preserve teacher representations in full, TSKD explicitly prioritizes features critical for decoding through supervised projection. To evaluate the framework, we define the task-specific ratio ($\textbf{TSR}$), a new metric that quantifies the proportion of task-relevant information retained after projection. Building on this framework, we propose the Implantable Neural Decoder ($\textbf{IND}$), a lightweight transformer architecture that combines linear attention with continuous wavelet tokenization, optimized for on-chip deployment. Across multiple neural datasets, IND consistently outperforms prior neural decoders on motor decoding tasks, while its TSKD-distilled variant further surpasses alternative distillation methods in few-shot calibration settings. Finally, we present a quantization-aware training scheme that enables integer-only inference with activation clipping ranges learned during training. The quantized IND enables deployment under the strict power constraints of implantable BCIs with minimal performance loss.

Supplementary Material: zip

Primary Area: applications to neuroscience & cognitive science

Submission Number: 20939

Loading