\section{Processing Methods Comparison}
\label{sec:processing}
\input{tables/processing}
\noindent 
In~\tableref{tab:processing_comp}, we can see how different processing methods affect the phrase grounding performance of our model.
The first two columns are concerned with which tokens should be considered for the creation of the activation maps.
Neither start nor end tokens are considered in either approach.
\citet{Dombrowski_2024} only used the tokens corresponding to the disease if at least one is present.
Otherwise, if the disease is not mentioned in the report, all tokens are used.
Meanwhile, our approach filters any words with no lexical meaning, and then uses the remaining tokens.
The approach is motivated by the findings seen in~\figureref{fig:tokens}.
This change yields a small improvement in phrase grounding performance compared to the original method.

The remaining three columns of~\tableref{tab:processing_comp} are concerned with different interpolation techniques that can be used for BBM.
Linear Bézier refers to a usual linear interpolation, meaning the equation 
\begin{equation}
    \label{eq:linear_int}
    sP_\text{mult} + (1-s)P_\text{comb}
\end{equation}
is used.
While this method improves CNR considerably, it is not well-suited for mask generation, since the larger activation areas lead to masks that are too large.
To fix this issue, Quadratic Bézier incorporates a quadratic Bézier curve for interpolation, namely
\begin{equation}
    \label{eq:quadratic_int}
    2(1-s)s(P_\text{mult}\odot P_\text{comb}) + (1-s)^2P_\text{comb} + s^2 P_\text{mult}. 
\end{equation}
The control point of the interpolation is the Hadamard product of $P_\text{mult}$ and  $P_\text{comb}$.
This serves primarily two purposes: first, to catch more complex multiplicative interactions between the two matrices.
Second, the multiplication with $P_\text{mult}$ acts as a gating mechanism, which means using the multiplicative interaction as the midpoint hinders the interpolation from having overly large areas of activation.
Instead, the interpolation has a greater focus on the relevant areas of activation.
Using this approach, the mask generation can be successfully improved, but at the cost of a lower improvement of CNR.
Consequently, we blend the linear and quadratic approach to gain the benefits of both, as can be seen in the Mixture Bézier column.
The result is an essentially linear interpolation which midpoint is combined with $P_\text{mult} \odot P_\text{comb}$.
Since a linear Bézier curve can be expressed as a quadratic Bézier curve with 
\begin{equation}
    \label{eq:lin_as_quad_int}
    2(1-s)s\left(\frac{P_\text{mult} + P_{\text{comb}}}{2}\right)+ (1-s)^2 P_{\text{comb}} + s^2P_\text{mult},
\end{equation}
incorporating the Hadamard product results in
\begin{equation}
    \label{eq:mixture_int}
    2(1-s)s\left(\frac{P_\text{mult} + P_{\text{comb}} + P_\text{mult} \odot P_{\text{comb}}}{2}\right) 
+ (1-s)^2 P_{\text{comb}} + s^2P_{\text{mult}}.
\end{equation}
This interpolation achieves a balance between the accuracy of the activation maps and their corresponding masks.
