\section{Introduction}
\label{sec:introduction}

Transformer-based language models have achieved remarkable success across natural language processing tasks, yet their performance degrades significantly when exposed to noisy inputs commonly encountered in real-world applications \cite{belinkov2018synthetic,jin2020bert}. Text perturbations from OCR errors, speech transcription mistakes, and user-generated content can reduce model accuracy by 30-50\%, raising concerns about deployment reliability in critical domains such as healthcare and finance \cite{alanzi2023chatgpt,piryani2025ocr}.

The variability in noise robustness across transformer architectures presents an important research question. Our experiments demonstrate that RoBERTa maintains 98.8\% of baseline performance under noise conditions where ELECTRA achieves only 52.7\%, despite similar architectural foundations. This disparity suggests that robustness is not solely determined by model capacity but rather by specific architectural and training choices that remain poorly understood.

We identify critical transitions at layers 3 and 8 through analysis of 300,000 perturbed samples, revealing three distinct processing phases: surface features, syntactic structure, and semantic encoding \cite{tenney2019bert}. Strategic layer dropout at these transitions achieves 2.47× measured speedup (validated on A100 GPUs) while maintaining 95\% accuracy. Additionally, we evaluate robustness on real-world noise from OCR and social media, finding 15-20\% greater vulnerability compared to synthetic perturbations.

\subsection{Contributions}

This paper makes four primary contributions:

\begin{enumerate}
\item \textbf{Layer-wise Vulnerability Analysis}: We present systematic analysis of noise robustness across transformer layers, identifying universal transitions at layers 3 and 8 (p < 0.001, Cohen's d > 3.0) that correspond to linguistic processing boundaries.

\item \textbf{Comparative Robustness Evaluation}: We quantify robustness differences across five encoder architectures and five noise types, revealing that RoBERTa achieves 0.988 average robustness compared to 0.527 for ELECTRA, with detailed analysis of architectural factors contributing to these differences.

\item \textbf{Cross-Architecture Transfer}: We demonstrate 61.1\% correlation in vulnerability patterns across models, suggesting fundamental computational properties independent of specific architectural choices.

\item \textbf{Optimization Framework}: We develop layer dropout strategies based on identified vulnerabilities, achieving theoretical speedup while maintaining performance, though we note that validation with actual runtime measurements remains future work.
\end{enumerate}

The paper proceeds as follows: Section 2 reviews related work, Section 3 describes methodology, Section 4 presents experiments including runtime validation and real-world noise evaluation, Section 5 provides theoretical analysis connecting empirical findings, Section 6 discusses decoder architectures and scalability, and Section 7 concludes.