Analysis of Information Transfer from Heterogeneous Sources via Precise High-dimensional Asymptotics
Abstract: We consider the problem of transfer learning -- gaining knowledge from one source task and applying it to a different but related target task. A fundamental question in transfer learning is whether combining the data of both tasks works better than using only the target task's data (equivalently, whether a "positive information transfer" happens). We study this question formally in a linear regression setting where a two-layer linear neural network estimator combines both tasks' data. The estimator uses a shared parameter vector for both tasks and exhibits positive or negative information transfer by varying dataset characteristics.
We characterize the precise asymptotic limit of the prediction risk of the above estimator when the sample sizes increase with the feature dimension proportionally at fixed ratios. We also show that the asymptotic limit is sufficiently accurate for finite dimensions. Then, we provide the exact condition to determine positive (and negative) information transfer in a random-effect model, leading to several theoretical insights. For example, the risk curve is non-monotone under model shift, thus motivating a transfer learning procedure that progressively adds data from the source task. We validate this procedure's efficiency on text classification tasks with a neural network that applies a shared feature space for both tasks, similar to the above estimator. The main ingredient of the analysis is finding the high-dimensional asymptotic limits of various functions involving the sum of two independent sample covariance matrices with different population covariance matrices, which may be of independent interest.
0 Replies
Loading