A Unified SPD Token Transformer Framework for EEG Classification: Systematic Comparison of Geometric Embeddings

Chi-Sheng Chen, En-Jui Kuo, Guan-Ying Chen, xinyu zhang, Fan Zhang

Published: 29 Jan 2026, Last Modified: 01 May 2026OpenReview Archive Direct UploadEveryonearXiv.org perpetual, non-exclusive license

Abstract: Spatial covariance matrices of EEG signals are Symmetric Positive Definite (SPD) and lie on a Riemannian manifold, yet the theoretical connection between embedding geometry and optimization dynamics remains unexplored. We provide a formal analysis linking embedding choice to gradient conditioning and numerical stability for SPD manifolds, establishing three theoretical results: (1) BWSPD's κ√ gradient conditioning (vs κ for Log-Euclidean) via Daleckii-Kre\uın matrices provides better gradient conditioning on high-dimensional inputs (d≥22), with this advantage reducing on low-dimensional inputs (d≤8) where eigendecomposition overhead dominates; (2) Embedding-Space Batch Normalization (BN-Embed) approximates Riemannian normalization up to O(ε2) error, yielding +26% accuracy on 56-channel ERP data but negligible effect on 8-channel SSVEP data, matching the channel-count-dependent prediction; (3) bi-Lipschitz bounds prove BWSPD tokens preserve manifold distances with distortion governed solely by the condition ratio κ. We validate these predictions via a unified Transformer framework comparing BWSPD, Log-Euclidean, and Euclidean embeddings within identical architecture across 1,500+ runs on three EEG paradigms (motor imagery, ERP, SSVEP; 36 subjects). Our Log-Euclidean Transformer achieves state-of-the-art performance on all datasets, substantially outperforming classical Riemannian classifiers and recent SPD baselines, while BWSPD offers competitive accuracy with similar training time.