\section{Learned Coefficient Values}
\label{app:learned_coeffs}

Table~\ref{tab:learned_coefficients} reports the average learned coefficients for each mergeability metric across all merging methods. These coefficients are obtained by averaging across all 20 folds of the leave-one-task-out cross-validation procedure. The coefficients operate on min-max normalized metrics (scaled to $[-1, 1]$), with normalization statistics computed only on training data within each fold to prevent data leakage.

\paragraph{Interpretation.}
A positive coefficient indicates that higher values of the corresponding metric predict better post-merge performance, while a negative coefficient indicates the opposite relationship. The magnitude of a coefficient reflects how strongly that metric influences the prediction, though direct comparison across metrics requires accounting for their different variances.

\paragraph{Stable Metrics.}
Seven metrics exhibit consistent sign across all four merging methods, suggesting they capture fundamental aspects of mergeability:

\begin{itemize}
    \item \textbf{Negative coefficients (higher values predict worse merging):}
    \begin{itemize}
        \item Encoder gradient L2 distance (avg: $-23.3$, range: $[-33.3, -19.5]$)
        \item Input gradient L2 distance (avg: $-21.4$, range: $[-36.3, -8.7]$)
        \item Input gradient dot product (avg: $-13.9$, range: $[-24.0, -0.2]$)
    \end{itemize}

    \item \textbf{Positive coefficients (higher values predict better merging):}
    \begin{itemize}
        \item Right subspace overlap (avg: $+16.5$, range: $[+3.3, +32.5]$)
        \item Left subspace overlap top-$k$ (avg: $+14.0$, range: $[+11.8, +15.4]$)
        \item Interaction matrix overlap top-$k$ (avg: $+8.8$, range: $[+5.8, +11.5]$)
        \item Encoder gradient cosine similarity (avg: $+8.1$, range: $[+2.7, +12.1]$)
    \end{itemize}
\end{itemize}

\paragraph{Key Observations.}
Several patterns emerge from the coefficient values:

\begin{itemize}
    \item \textbf{Gradient-based metrics} consistently receive large-magnitude coefficients. The L2 distance metrics for both encoder and input gradients show negative coefficients across all methods, indicating that large gradient differences between task vectors are detrimental to merging success.

    \item \textbf{Subspace overlap metrics} tend to receive positive coefficients, suggesting that task vectors sharing similar principal directions merge more effectively.

    \item \textbf{Method-specific patterns}: Weight Averaging shows distinctively large coefficients for right subspace overlap ($+20.1$), while Isotropic Merging shows large coefficients for gradient-based metrics. Task Arithmetic and TSV show more balanced distributions across metric categories.

    \item \textbf{High variance}: Many coefficients exhibit high standard deviation across folds (often exceeding the mean), reflecting the challenge of learning stable predictors from limited data and the sensitivity to which task is held out.
\end{itemize}

\begin{table}[htbp]
\centering
\caption{Average learned coefficients for each mergeability metric across merging methods, obtained via leave-one-task-out cross-validation. Positive coefficients indicate that higher metric values predict better post-merge performance, while negative coefficients indicate the opposite. Coefficients are on normalized metrics (min-max scaled to $[-1, 1]$).}
\label{tab:learned_coeffs}
\resizebox{\textwidth}{!}{%
\begin{tabular}{clrrrr}
\toprule
 & \textbf{Metric} & \textbf{Task Arithmetic} & \textbf{Weight Averaging} & \textbf{Isotropic} & \textbf{TSV} \\
\midrule
 & TV Cosine Sim & 7.29 & 5.30 & -3.94 & 4.67 \\
 & TV L2 Dist & -15.87 & 1.64 & -21.65 & 14.24 \\
\multirow{5}{*}{\rotatebox{90}{\textbf{Task Vector Geometry}}} & TV Dot Prod & 11.64 & 2.02 & -4.70 & 9.04 \\
 & Weight Angle & 5.38 & 1.97 & 6.96 & -11.26 \\
 & TV Mag Ratio & -2.43 & -2.42 & -13.42 & 14.74 \\
\midrule
 & Eff Rank & -4.89 & 4.42 & -5.16 & 0.50 \\
 & Eff Rank Score & 7.52 & 1.99 & 2.60 & -1.11 \\
 & Stable Rank & -6.66 & -5.54 & 5.08 & -10.85 \\
\multirow{7}{*}{\rotatebox{90}{\textbf{Effective Rank}}} & Spectral Gap & 1.56 & -1.93 & 9.19 & -23.82 \\
 & SV Ratio & -5.24 & -1.42 & -1.76 & 1.61 \\
 & Layer Eff Rank & 5.63 & -1.69 & -3.04 & -0.04 \\
 & Layer Eff Rank Score & -7.68 & -7.52 & 7.96 & 15.13 \\
\midrule
 & SV Overlap & -24.72 & 12.51 & -27.10 & -17.28 \\
 & Left Sub Top-$k$ & 15.36 & 11.78 & 15.01 & 13.97 \\
 & Right Sub Top-$k$ & 13.30 & 16.22 & 17.94 & -12.44 \\
\multirow{6}{*}{\rotatebox{90}{\textbf{Subspace Overlap}}} & Right Sub Bot-$k$ & 0.38 & -12.27 & 35.02 & -11.88 \\
 & Interact Top-$k$ & 11.48 & 10.10 & 7.69 & 5.84 \\
 & Interact Bot-$k$ & 10.99 & -14.78 & 30.28 & -8.32 \\
\midrule
 & Act L2 Dist & -12.40 & -5.33 & -9.56 & 9.23 \\
 & Act Cosine Sim & 0.70 & 3.58 & -0.46 & -6.44 \\
\multirow{4}{*}{\rotatebox{90}{\textbf{Activation}}} & Act Mag Ratio & -10.79 & -13.26 & -10.84 & 2.99 \\
 & Act Dot Prod & 17.93 & 7.83 & -5.49 & 22.96 \\
\midrule
 & Enc Grad Cos & 11.41 & 12.08 & 5.99 & 2.70 \\
 & Enc Grad L2 & -20.84 & -19.64 & -33.31 & -19.53 \\
 & Enc Grad Dot & 12.00 & -2.11 & 11.17 & -4.63 \\
\multirow{6}{*}{\rotatebox{90}{\textbf{Gradient-Based}}} & Input Grad Cos & 13.86 & -3.87 & 13.96 & 25.42 \\
 & Input Grad L2 & -23.00 & -8.72 & -36.30 & -17.59 \\
 & Input Grad Dot & -20.90 & -10.32 & -24.03 & -0.25 \\
\bottomrule
\end{tabular}%
}
\end{table}