\section{Individual Metric Correlations}
\label{app:individual_metrics}

To motivate the need for a learned linear combination of mergeability metrics, we examine the predictive power of each metric in isolation. Table~\ref{tab:individual_metric_correlations} reports the Pearson correlation between each individual metric and the normalized post-merge accuracy across all 190 task pairs, computed separately for each merging method.

The results reveal that no single metric consistently achieves strong correlations across all merging methods. The highest individual correlations are observed for activation-based metrics, particularly activation dot product ($r = 0.572$ for TSV, $r = 0.450$ for Weight Averaging) and activation cosine similarity ($r = 0.521$ for TSV, $r = 0.366$ for Weight Averaging). However, these same metrics show weak or non-significant correlations for Task Arithmetic ($r = 0.094$, $r = 0.146$) and Isotropic merging ($r = -0.095$, $r = 0.062$), indicating that their predictive utility is method-dependent.

Task vector geometry metrics, which are computationally inexpensive and have been proposed as indicators of mergeability in prior work, show uniformly weak correlations (typically $|r| < 0.1$) across all methods. Similarly, effective rank metrics achieve at most weak correlations ($|r| < 0.2$), despite theoretical motivation linking spectral properties to merge success.

Subspace overlap metrics exhibit an interesting pattern: metrics focusing on bottom singular vectors (Right Sub Bot-$k$, Interact Bot-$k$) show moderate correlations for Task Arithmetic ($r \approx 0.21$) and Isotropic merging ($r \approx 0.31$), but near-zero correlations for Weight Averaging and TSV. This suggests that different aspects of task vector geometry are relevant for different merging strategies.

The inconsistency of individual metric correlations across methods underscores a key finding: mergeability prediction requires combining multiple complementary signals. A metric that is predictive for one method may be uninformative or even misleading for another. This observation motivates our approach of learning method-specific weighted combinations of metrics, which achieves substantially higher correlations (see Section~\ref{sec:results}) by leveraging the complementary information captured by different metric categories.

\input{results/individual_metric_correlations.tex}