Keywords: features, analysis, generalization, transfer, few-shot
Abstract: A large body of work has explored how learned feature representations can be useful for a variety of downstream tasks. This is true even when the downstream tasks differ greatly from the actual objective used to (pre)train the feature representation. This observation underlies the success of, e.g., few-shot learning, transfer learning and self-supervised learning, among others. However, very little is understood about why such transfer is successful, and more importantly, how one should choose the pre-training task. As a first step towards this understanding, we ask: what makes a feature representation good for a target task? We present simple, intuitive measurements of the feature space that are good predictors of downstream task performance. We present theoretical results showing how these measurements can be used to bound the error of the downstream classifiers, and show empirically that these bounds correlate well with actual downstream performance. Finally, we show that our bounds are practically useful for choosing the right pre-trained representation for a target task.
One-sentence Summary: We present intuitive properties of the feature space that (theoretically and empirically) govern the performance of downstream classifiers.
Supplementary Material: zip
21 Replies
Loading