\section{Introduction}

Medical image classification is a cornerstone of modern computer-aided diagnosis. While discriminative models like CNNs \cite{he2016deep} and ViTs \cite{dosovitskiy2020image} have achieved high benchmarks, they often exhibit fragility in clinical reality. Medical images are inherently plagued by noise and artifacts, and traditional models tend to exploit spurious correlations or ``shortcuts'' \cite{geirhos2020shortcut}, leading to over-confident predictions even under significant distribution shifts \cite{esteva2019guide, li2025catch, hu2026codes}. Consequently, constructing a system that is not only accurate but also trustworthy, providing calibrated uncertainty estimates, and maintaining robustness, remains a critical challenge for patient safety~\cite{sha2025fastcadfairnessawareframeworknoncontact}.

To mitigate these limitations, the field has witnessed a shift towards {Generative Classifiers} leveraging Denoising Diffusion Probabilistic Models (DDPMs) \cite{ho2020denoising}. Current explorations primarily follow two paths. The {first path} reformulates classification as conditional image generation, comparing reconstruction error (MSE) to assign labels \cite{favero2025conditional, muller2022diffusion}. However, this strategy suffers from a critical {inductive bias}: pixel-level reconstruction quality does not equate to diagnostic correctness \cite{zhang2018unreasonable}, and the iterative sampling required for every class is computationally prohibitive for clinical workflows \cite{chen2023robust}.

The {second path} models classification directly as a {``label generation''} process. While efficient, existing frameworks \cite{yang2023diffmic, yang2025diffmic, han2022card} face a fundamental reliability hazard stemming from a {geometric conflict}. They apply unbounded latent Gaussian noise to discrete One-Hot label vectors constrained to a bounded probability simplex. This theoretical mismatch forces noisy states off the valid manifold, hindering modeling precision and causing the model to learn a biased posterior that underestimates uncertainty \cite{hoogeboom2021argmax,austin2021structured}. Furthermore, existing architectures \cite{shen2021interpretable, yang2025diffmic,li2025hyfacialhybridfeatureextraction} often rely on static fusion for {conditional guidance}, overlooking the dynamic semantic dependency between global anatomical contexts and local lesions.

To address these challenges, we propose a novel Simplex-Aligned Diffusion framework. We are the first to reformulate label generation from the discrete one-hot simplex to the continuous logit manifold for medical classification. This mapping acts as an intrinsic geometric safety constraint, ensuring mathematical consistency with Gaussian diffusion. The main contributions are:

\begin{itemize}
    \item We propose a generative classification strategy that operates in the logit space. This approach effectively resolves the theoretical conflict between simplex constraints and Gaussian noise assumptions, providing a mathematically consistent solution for robust label generation.
    \item Through systematic evaluation under ImageNet-C style input corruptions(e.g., sensor noise, blur)\footnote{We distinguish \textit{input corruptions} (e.g., sensor noise, blur) used for robustness testing from the \textit{latent generative noise} used within the diffusion process.}, we reveal that while maintaining competitive accuracy on clean data, our logit-based diffusion demonstrates significantly superior resilience to artifacts and uncertainty calibration compared to standard one-hot diffusion baselines.
    \item We design a Transformer-based interaction module to refine the feature coupling within the frozen encoder. This mechanism explicitly models the dependency between global and local views, ensuring the diffusion model receives stable and precise visual guidance without requiring heavy architectural changes.
\end{itemize}