\section{Data}
\subsection{CT-RATE / RadGenome-ChestCT Label Distribution}
\label{app:pathology_distribution}
Table \ref{tab:abnormality_counts} summarizes the prevalence of the 18 pathology labels in the RadGenome-ChestCT training and validation splits. The distribution is clearly imbalanced, with some pathologies occurring far more frequently than others and a skewed ratio between healthy and pathological cases across labels.

\begin{table}[h]
    \centering
    \caption{Pathology counts in the 23{,}880 training and 1{,}552 validation reports.}
    \label{tab:abnormality_counts}
    \begin{tabular}{lcccc}
        \hline
        \textbf{Pathology} & \textbf{Train} & \textbf{Val} & \textbf{Train Ratio} & \textbf{Val Ratio} \\
        \hline
        Medical material                & 2{,}811 & 151 & 0.118 & 0.097 \\
        Arterial wall calcification     & 6{,}570 & 420 & 0.275 & 0.271 \\
        Cardiomegaly                    & 2{,}480 & 156 & 0.104 & 0.101 \\
        Pericardial effusion            & 1{,}654 & 104 & 0.069 & 0.067 \\
        Coronary artery calcification   & 5{,}856 & 378 & 0.245 & 0.244 \\
        Hiatal hernia                   & 3{,}386 & 215 & 0.142 & 0.139 \\
        Lymphadenopathy                 & 6{,}023 & 389 & 0.252 & 0.251 \\
        Emphysema                       & 4{,}633 & 300 & 0.194 & 0.193 \\
        Atelectasis                     & 6{,}076 & 356 & 0.254 & 0.229 \\
        Lung nodule                     & 10{,}874 & 680 & 0.455 & 0.438 \\
        Lung opacity                    & 8{,}788 & 598 & 0.368 & 0.385 \\
        Pulmonary fibrotic sequela      & 6{,}368 & 410 & 0.267 & 0.264 \\
        Pleural effusion                & 2{,}818 & 179 & 0.118 & 0.115 \\
        Mosaic attenuation pattern      & 1{,}748 & 124 & 0.073 & 0.080 \\
        Peribronchial thickening        & 2{,}454 & 164 & 0.103 & 0.106 \\
        Consolidation                   & 4{,}203 & 286 & 0.176 & 0.184 \\
        Bronchiectasis                  & 2{,}402 & 161 & 0.101 & 0.104 \\
        Interlobular septal thickening  & 1{,}868 & 121 & 0.078 & 0.078 \\
        \hline
    \end{tabular}
\end{table}

\newpage
\subsection{Rad-ChestCT Label Mapping}
\label{app:class_mapping}
To ensure consistent evaluation across datasets, we align the CT-RATE pathology labels with the more fine-grained annotation schema of RAD-ChestCT. Table~\ref{tab:class_mapping} shows the mapping used in our experiments. Several CT-RATE labels correspond to multiple RAD-ChestCT labels (e.g., Medical material, Lung nodule), which we merge into a single binary label to maintain compatibility with the CT-RATE taxonomy. This harmonization enables the use of the identical classifier and evaluation metrics for both datasets.

\begin{table}[!h]
    \centering
    \caption{Label mapping between CT-RATE and RAD-ChestCT datasets.}
    \label{tab:class_mapping}
    \begin{tabular}{l p{0.55\linewidth}}
        \hline
        \textbf{CT-RATE Label} & \textbf{RAD-ChestCT Label} \\
        \hline
        Medical material & pacemaker\_or\_defib, catheter\_or\_port, hardware, stent, suture, staple, chest\_tube, tracheal\_tube, gi\_tube, breast\_implant, heart\_valve\_replacement, clip \\
        Arterial wall calcification & calcification, scattered\_calc \\
        Cardiomegaly & cardiomegaly \\
        Pericardial effusion & pericardial\_effusion \\
        Coronary artery wall calcification & calcification, scattered\_calc \\
        Hiatal hernia & hernia \\
        Lymphadenopathy & lymphadenopathy \\
        Emphysema & emphysema \\
        Atelectasis & atelectasis \\
        Lung nodule & nodule, nodulegr1cm, scattered\_nod \\
        Lung opacity & opacity \\
        Pulmonary fibrotic sequela & fibrosis \\
        Pleural effusion & pleural\_effusion \\
        Mosaic attenuation pattern & all\_zeros \\
        Peribronchial thickening & bronchial\_wall\_thickening \\
        Consolidation & consolidation \\
        Bronchiectasis & bronchiectasis \\
        Interlobular septal thickening & septal\_thickening \\
        \hline
    \end{tabular}
\end{table}


\subsection{AMOS-MM Processing}
\label{app:amos_processing}
\textbf{Volume Processing.}
AMOS-MM contains thoracoabdominal CT volumes with reports covering chest, abdomen, and pelvis. To align the dataset with the chest-focused CT-RATE setup, we extract only the thoracic region using the following steps:
\begin{enumerate}
    \item Retain studies that include a chest-related report section.
    \item Run TotalSegmentator \citep{wasserthal2023totalsegmentator} to isolate thoracic region defined by:\\
    \texttt{lung\_upper\_lobe\_left}, \texttt{lung\_lower\_lobe\_left}, \texttt{lung\_upper\_lobe\_right}, \\
    \texttt{lung\_middle\_lobe\_right},  \texttt{lung\_lower\_lobe\_right}, \texttt{esophagus}, \texttt{trachea}
    \item Crop the volume to the thoracic bounding box, dropping non-thoracic slices.
\end{enumerate}

\textbf{Report Processing.}
For anatomy-level evaluation, we convert the free-text AMOS-MM chest reports into the structured anatomy-level format (following RadGenome-ChestCT). We use the GPT-4.1 (2025-04-14) model via Azure OpenAI Services. The model extracts each sentence from the report and assigns predefined anatomies to the sentence (Figure~\ref{fig:fig_anatomy_present_prompt}).

\begin{figure}[h]
\centering
\begin{AIbox}{AMOS-MM Structuring Prompt}
\footnotesize
\begin{verbatim}
You are a radiologist tasked with extracting anatomical regions from the findings
section of radiology reports. For each sentence provided, identify the corresponding
anatomical regions. Ensure each identified region is an entry from a predefined list:
[", ".join(ANATOMY_LIST)]

If a sentence mentions 'left' or 'right', these qualifiers should precede the 
anatomical region (e.g., left kidney). Given input in the format:
<Input><findings><\Input>.

Please reply in the following JSON format:
{<sentence>: [region1,region2,...], <sentence>: [region1]}.

Findings: {findings}
\end{verbatim}
\end{AIbox}
\caption{GPT-4.1 report structuring prompt.}
\label{fig:fig_anatomy_present_prompt}
\end{figure}

\noindent\textbf{Findings:} ``A few speckled slightly high-density lesions can be seen in the right upper lobe of the lung and the left lower lobe, with unclear boundaries. Local transparency is increased in the right lung. The trachea and bronchi are unobstructed. The size and shape of the heart and great blood vessels are normal. Local pleural thickening on both sides.''

\begin{figure}[h]
\centering
\begin{AIbox}{Example GPT-4.1 Response}
\footnotesize
\begin{verbatim}
"result": {
    "A few speckled slightly high-density lesions can be seen in the right upper lobe
    of the lung and the left lower lobe, with unclear boundaries.": [
        "right lung",
        "left lung"
    ],
    "Local transparency is increased in the right lung.": [
        "right lung"
    ],
    "The trachea and bronchi are unobstructed.": [
        "trachea and bronchi"
    ],
    "The size and shape of the heart and great blood vessels are normal.": [
        "heart"
    ],
    "Local pleural thickening on both sides.": [
        "pleura"
    ],
}
\end{verbatim}
\end{AIbox}
\caption{GPT-4.1 report structuring example response.}
\label{fig:fig_gpt_response}
\end{figure}


\subsection{Report Structuring Evaluation}
To assess the robustness of the report structuring stage, we evaluate content preservation on CT-RATE using 1{,}000 samples from the training split. Structured reports derived from ground-truth free-text reports are compared against the original reports using the same clinical, NLG, and classification metrics as in the main evaluation, verifying that the structuring process preserves the underlying content.

\begin{table}[h]
    \centering
    \caption{Report structuring evaluation on 1{,}000 CT-RATE training samples. High scores indicate that report content is preserved after structuring.}
    \label{tab:structuring_quality}
    \resizebox{\textwidth}{!}{%
    \begin{tabular}{
        >{\centering\arraybackslash}p{1.2cm}
        >{\centering\arraybackslash}p{1.0cm}
        >{\centering\arraybackslash}p{1.6cm}
        >{\centering\arraybackslash}p{1.9cm}
        >{\centering\arraybackslash}p{0.9cm}
        p{0.05cm}
        >{\centering\arraybackslash}p{1.1cm}
        >{\centering\arraybackslash}p{1.1cm}
        p{0.05cm}
        >{\centering\arraybackslash}p{0.8cm}
        >{\centering\arraybackslash}p{0.8cm}
        >{\centering\arraybackslash}p{0.8cm}
    }
        \hline
        \multicolumn{5}{c}{\textbf{Clinical} $\uparrow$}
        & & \multicolumn{2}{c}{\textbf{NLG} $\uparrow$}
        & & \multicolumn{3}{c}{\textbf{CL (macro)} $\uparrow$} \\
        \cline{1-5} \cline{7-8} \cline{10-12}
        GREEN & RaTE & RadGraph & 1/RadCLIQ & CRG
        & & BLEU & BERT
        & & P & R & F1 \\
        \hline

        0.959 & 0.997 & 0.706 & - & 0.907
        & & 0.877 & 0.885
        & & 0.978 & 0.951 & 0.964 \\
        \hline
    \end{tabular}
    }
\end{table}