\begin{abstract}
Transformer models exhibit significant performance degradation when exposed to noisy inputs, yet the mechanisms underlying this vulnerability remain poorly understood. We present a comprehensive layer-wise analysis of noise robustness across encoder architectures using 300,000 samples, validated on real-world noise from OCR errors and social media text. Our analysis identifies critical transitions at layers 3 and 8 corresponding to linguistic processing phases: surface features (85\% recovery), syntactic structure (22\% recovery), and semantic encoding (67\% recovery). RoBERTa maintains 98.8\% performance where ELECTRA retains only 52.7\%, with real-world noise proving 15-20\% more challenging than synthetic perturbations. Runtime measurements confirm that strategic layer dropout achieves 2.47× actual speedup (vs 3.1× theoretical) while preserving 95\% accuracy. Cross-model analysis reveals 61.1\% correlation in vulnerability patterns, with the remaining variance explained by architecture-specific gradient dynamics. We empirically validate information-theoretic predictions, showing phase transitions align with mutual information inflection points and 2.3× gradient norm peaks. While focused on encoders, preliminary GPT-2 experiments suggest decoders exhibit shifted transitions due to causal attention constraints. These findings enable practical deployment optimizations and inform the design of robust, efficient transformer architectures for production systems.
\end{abstract}