On Stronger Computational Separations Between Multimodal and Unimodal Machine Learning

Published: 02 May 2024, Last Modified: 25 Jun 2024ICML 2024 SpotlightEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Recently, multimodal machine learning has enjoyed huge empirical success (e.g. GPT-4). Motivated to develop theoretical justification for this empirical success, Lu (NeurIPS '23, ALT '24) introduces a theory of multimodal learning, and considers possible *separations* between theoretical models of multimodal and unimodal learning. In particular, Lu (ALT '24) shows a computational separation, which is relevant to *worst-case* instances of the learning task. In this paper, we give a stronger *average-case* computational separation, where for "typical" instances of the learning task, unimodal learning is computationally hard, but multimodal learning is easy. We then question how "natural" the average-case separation is. Would it be encountered in practice? To this end, we prove that under basic conditions, any given computational separation between average-case unimodal and multimodal learning tasks implies a corresponding cryptographic key agreement protocol. We suggest to interpret this as evidence that very strong *computational* advantages of multimodal learning may arise *infrequently* in practice, since they exist only for the "pathological" case of inherently cryptographic distributions. However, this does not apply to possible (super-polynomial) *statistical* advantages.
Submission Number: 2557
Loading