\section{Related Work}

\textbf{Structured Findings Generation.} Findings section of a radiology report comprises of visual observations from a given chest X-ray. Usually, these are free-text reports but there is a growing body of work that establishes the utility of structured reports. \citet{structured_1} showed that clinicians rated structured reports to be significantly more complete and more effective. \citet{structured_recall} showed that structured reports allowed better recall  of diagnosis and critical findings and overall both referring physicians and radiologists preferred structured reports over free-text reports \cite{structured_preference}. Recently, \citet{srrg} introduced a desiderata for structured reporting where they divided the entire radiology report into predefined sections and within the findings section, they further divided by 8 anatomical headers mentioned previously. They converted the free-text reports of MIMIC-CXR and CheXpert Plus to structured reports and introduced two new datasets called SRRG-Findings and SRRG-Impression. \citet{csrrg} further added clinical context like multiple views, clinical indication, imaging techniques used and prior studies to give a new dataset called contextualized SRRG (C-SRRG).

Beyond clinical utility, in automated report generation systems, structured reports help mitigate distributional shift between textual reports originating from different datasets, where the same clinical finding may be described in markedly different styles due to linguistic, institutional, or regional differences among radiologists. By standardizing both the reporting categories and the linguistic style, structured reports reduce this variability and provide more consistent supervision for model training. Additionally, the natural division of the findings section into well-defined anatomical categories enables category-wise parametrization and modular report generation. We believe this structure promotes stronger visual grounding by preventing over-reliance on language priors and by reducing the number of tokens generated within each continuous forward pass.

\noindent
\textbf{Contrastive Decoding.} Contrastive decoding (CD) is a training-free inference time strategy for reducing hallucinations in text generative models \cite{contrastive-open-ended, vcd, contrastive-reasoning}. The main idea of CD is to overcome statistical biases (like object co-occurrences) inherent in the training data and and in case of MLLMs, prevent over-reliance on textual priors learned during the pre-training of the LLM. Contrasting with the distribution produced after masking the key information required to generate the correct output penalizes the tokens that are generated when the key information is missing, effectively exposes the prior bias of the model. Various approaches for CD in MLLMs have been tried, \citet{vcd} contrast output distributions derived from original and distorted visual inputs, \citet{itav} contrast inter-layer representations, \citet{crg} contrast model outputs produced with and without visual prompts. While CD has worked well for mitigating hallucinations in natural image captioning tasks, its use for medical tasks has been very limited. \citet{contrastive-medical} developed Alternative CD for medical information extraction task, where they alternately contrasted output distributions from sub-task modules. \citet{ccd} introduces a dual-stage CD mechanism for RRG. Both \citet{contrastive-medical} and \citet{ccd} contrast with text based approaches, whereas, to the best of our knowledge, we are the first to introduce an image based CD approach for RRG i.e., the contrasted distribution is generated by masking the X-ray instead of masking the text. 