\textbf{Limitations.} Although our training pipeline is relatively lightweight, inference remains computationally expensive: predictions must be generated across all eight categories, and the CD component requires two forward passes per token. As a result, the overall inference process is time-intensive. Additionally, because the structured reports were derived by reformulating MIMIC-CXR free-text reports using a language model, there is a risk that subtle inconsistencies or biases may have been introduced by the model. Finally, our pipeline relies on automated anatomical classification by a large language model; while prior work shows strong performance \cite{deepseek-justification-1, deepseek-justification2}, misclassification errors may propagate downstream and affect report generation quality.\\

\noindent
\section{Conclusion} 
Foundational radiology MLLMs generate a radiology report in a single set of forward passes. We show that this leads to reduced attention on image tokens and over-reliance on prior textual tokens leading to limited clinical accuracy of automated reports. To address these issues, we introduce Category-Wise Contrastive Decoding (CWCD), a framework that generates category-wise structured reports through category-specific parameterization and masked contrastive decoding. Experiments on MIMIC-CXR and the out-of-distribution IU-Xray demonstrate that CWCD strengthens visual grounding, enhances clinical fidelity, and improves the linguistic quality of generated reports, advancing the capabilities of foundational radiology MLLMs. \\

\noindent
\textbf{Acknowledgment.} This work was supported by the US NSF CAREER award IIS-2239537.