DeepSpike: Foundation Model-based Pipeline for Large-Scale Spike Sorting of Neural Activity

TMLR Paper8168 Authors

29 Mar 2026 (modified: 13 Apr 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Spike sorting of high-resolution neural recordings is essential for understanding brain activity, but it remains challenging when multiple units are recorded due to their overlapping spike timing, low signal-to-noise ratios and overlapping clusters. Here, we introduce DeepSpike, a self-supervised deep learning model that automates spike sorting and overcomes key limitations of conventional spike sorting methods. Pretrained on large-scale unlabeled spiking events as a reusable self-supervised encoder, it generalizes to new recordings without retraining. DeepSpike uses a self-supervised autoencoder to learn robust low-dimensional spike embeddings that facilitate accurate clustering and effective noise filtering. The model is trained on a new, large-scale dataset consisting of $255M$ spiking events (SpikeVault-255M) derived from real in vivo recordings of about $4560$ minutes duration. The dataset consists of $15M$ ground truth spikes that are manually verified by an expert user. DeepSpike outperformed state-of-the-art spike sorting algorithms in both accuracy and robustness in our experiments on SpikeVault-255M, and two public benchmark datasets. Our results demonstrate that large-scale, self-supervised pre-training yields a powerful and generalizable encoder for automated spike sorting. The Spike Vault-255M dataset and the pre-trained DeepSpike model are made publicly available to facilitate further research and development.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Blake_Aaron_Richards1
Submission Number: 8168
Loading