\section{Complementary results}\label{appendix:additional-results}
\subsection{Unimodal performances}
Performances of unimodal encoders are provided in Table~\ref{tab:unimodal-loc} for \texttt{LOC} and Table~\ref{tab:unimodal-stress} for \texttt{StressID}.
\input{tables/additional-results}
\subsection{ADAPT's robustness to missing modalities}
\paragraph{Evaluation of ADAPT on $X_{\text{test}}^{\text{*}}$.}
% \begin{table}[h]

% \end{table}
Table~\ref{tab:scenarios_inter} assesses ADAPT across three modality scenarios, evaluating its robustness by removing one or two modalities from $X_{\text{test}}^{\text{*}}$ (i.e., samples where all modalities are available) and comparing the results to the baseline ($X_{\text{test}}^{\text{*}}$ without any modality removed). We calculate the differences ($\Delta$) for comparison. Overall, $|\Delta| < 3.2$, further highlighting ADAPT's robustness with full modality availability. Importantly, even though video-only performs better individually by a large margin, ADAPT maintains robust results when it is removed (row 3).