Transfer Learning Across Datasets with Different Input Dimensions: An Algorithm and Analysis for the Linear Regression Case

Transfer Learning Across Datasets with Different Input Dimensions: An Algorithm and Analysis for the Linear Regression Case

TMLR Paper596 Authors

14 Nov 2022 (modified: 17 Sept 2024)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: With the development of new sensors and monitoring devices, more sources of data become available to be used as inputs for machine learning models. These can on the one hand help to improve the accuracy of a model. On the other hand, combining these new inputs with historical data remains a challenge that has not yet been studied in enough detail. In this work, we propose a transfer learning algorithm that combines the new and historical data with different input dimensions, which is especially beneficial when the new data is scarce. We focus the approach on the linear regression case, which allows us to conduct a rigorous theoretical study on the benefits of the approach. Our approach achieves state-of-the-art performance on several real-life datasets, outperforming other linear transfer learning algorithms and performing comparably to non-linear ones. In addition, we prove that our approach is robust against negative transfer learning assuming that the new inputs are normally distributed, and confirm its robustness empirically also on real-world data distributions.

Submission Length: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Novi_Quadrianto1

Submission Number: 596

Loading