\paragraph{Metric Ablation.}
To validate the coefficient analysis, we perform ablation experiments removing entire metric categories from the linear optimization (Table~\ref{tab:metric_ablation}). Consistent with the learned coefficients, subspace and gradient-based metrics prove most critical: removing subspace metrics causes the largest drop (average $\Delta r = -0.12$), with TSV declining from $r=0.57$ to $r=0.37$, while removing gradient metrics yields $\Delta r = -0.06$. This aligns with the stable metrics analysis, where \texttt{right\_subspace\_overlap} and gradient L2 distances had the largest consistent coefficients. Effective rank metrics contribute minimally ($\Delta r = -0.01$). Notably, removing activation or task vector metrics causes no degradation---performance even improves slightly---indicating these categories are redundant given the subspace and gradient information. A minimal metric set of 13 subspace and gradient metrics would likely match the full 28-metric performance.
