Abstract: We explore element-wise convex combinations of two permutation-aligned neural network parameter vectors ΘA<math><msub is="true"><mrow is="true"><mi is="true">Θ</mi></mrow><mrow is="true"><mi is="true">A</mi></mrow></msub></math> and ΘB<math><msub is="true"><mrow is="true"><mi is="true">Θ</mi></mrow><mrow is="true"><mi is="true">B</mi></mrow></msub></math> of size d<math><mi is="true">d</mi></math>. We conduct extensive experiments by examining various distributions of such model combinations parametrized by elements of the hypercube [0,1]d<math><msup is="true"><mrow is="true"><mrow is="true"><mo is="true">[</mo><mn is="true">0</mn><mo is="true">,</mo><mn is="true">1</mn><mo is="true">]</mo></mrow></mrow><mrow is="true"><mi is="true">d</mi></mrow></msup></math> and its vicinity. Our findings reveal that broad regions of the hypercube form surfaces of low loss values, indicating that the notion of linear mode connectivity extends to a more general phenomenon which we call mode combinability. We also make several novel observations regarding linear mode connectivity and model re-basin. We demonstrate a transitivity property: two models re-based to a common third model are also linear mode connected, and a robustness property: even with significant perturbations of the neuron matchings the resulting combinations continue to form a working model. Moreover, we analyze the functional and weight similarity of model combinations and show that such combinations are non-vacuous in the sense that there are significant functional differences between the resulting models.
Loading