An Information-Geometric Distance on the Space of TasksDownload PDF

Published: 07 Nov 2020, Last Modified: 05 May 2023NeurIPSW 2020: DL-IG OralReaders: Everyone
Keywords: information theory, transfer learning, fisher-rao metric, optimal transportation
TL;DR: We compute a distance on tasks by solving a transport problem on the input space and simultaneously modifying the weights to find the shortest geodesic on the statistical manifold.
Abstract: We compute a distance between tasks modeled as joint distributions on data and labels. We develop a stochastic process that transports the marginal on the data of the source task to that of the target task, and simultaneously updates the weights of a classifier initialized on the source task to track this evolving data distribution. The distance between two tasks is defined to be the shortest path on the Riemannian manifold of the conditional distribution of labels given data as the weights evolve. We derive connections of this distance with Rademacher complexity-based generalization bounds; distance between tasks computed using our method can be interpreted as the trajectory in weight space that keeps the generalization gap constant as the task distribution changes from the source to the target. Experiments on image classification datasets verify that this task distance helps predict the performance of transfer learning and shows consistency with fine-tuning results.
3 Replies

Loading