Interpreting Emergent Features in Deep Learning-based Side-channel Analysis

Sengim Karayalcin; Marina Krček; Stjepan Picek

Interpreting Emergent Features in Deep Learning-based Side-channel Analysis

Sengim Karayalcin, Marina Krček, Stjepan Picek

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Mechanistic interpretability, Deep learning-based Side-channel analysis, Explainability, Neural networks

TL;DR: We use mechanistic interpretability to reverse engineer how neural networks break protected cryptographic implementations via side-channel analysis.

Abstract: Side-channel analysis (SCA) poses a real-world threat by exploiting unintentional physical signals to extract secret information from secure devices. Evaluation labs also use the same techniques to certify device security. In recent years, deep learning has emerged as a prominent method for SCA, achieving state-of-the-art attack performance at the cost of interpretability. Understanding how neural networks extract secrets is crucial for security evaluators aiming to defend against such attacks, as only by understanding the attack can one propose better countermeasures. In this work, we apply mechanistic interpretability to neural networks trained for SCA, revealing $\textit{how}$ models exploit $\textit{what}$ leakage in side-channel traces. We focus on sudden jumps in performance to reverse engineer learned representations, ultimately recovering secret masks and moving the evaluation process from black-box to white-box. Our results show that mechanistic interpretability can scale to realistic SCA settings, even when relevant inputs are sparse, model accuracies are low, and side-channel protections prevent standard input interventions.

Primary Area: Applications (e.g., vision, language, speech and audio, Creative AI)

Submission Number: 20194

Loading