Machine Learning within Latent Spaces formed by Foundation Models

Bernard Tomczyk, Plamen Angelov, Dmitry Kangin

Published: 01 Jan 2024, Last Modified: 13 Nov 2024IS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Foundation Models (FM) developed on very large generic data sets transformed the landscape of machine learning (ML). Vision transformers (ViT) closed the gap in performance between fine-tuned and unsupervised transfer learning. This opens the possibility to abandon the widely used until recently end-to-end approach. Instead, we consider a two-stage ML pipeline, where the first stage constitutes extracting features by pre-training large, multi-layer model with billions of parameters, and the second stage is a computationally lightweight learning of an entirely new, simpler model architecture based on prototypes within this feature space. In this paper we consider such two-stage approach to ML. We further analyse the use of several alternative light-weight methods in the second stage, including strategies for semi-supervised learning and a variety of strategies for linear fine-tuning. We demonstrate on the basis of nine well known benchmark data sets that the ultra-light-weight ML alternatives for the second stage (such as clustering, PCA, LDA and combinations of these) offer for the price of negligible drop in accuracy a significant (several orders of magnitude) drop of computational costs (time, energy and related CO 2 emissions) as well as the ability to use no labels (fully unsupervised approach) or limited amount of labels (one per cluster labels) and the ability to address interpretability.