Big Cooperative Learning to Conquer Local Optima

Yulai Cong

Big Cooperative Learning to Conquer Local Optima

Yulai Cong

22 Jan 2025 (modified: 18 Jun 2025)Submitted to ICML 2025EveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We propose a general learning concept that has the potential to conquer the local optima of conventional learning paradigms.

Abstract: Cutting-edge foundation models have sparked a groundbreaking AI revolution in a wide range of sophisticated real-world applications. In stark contrast, conventional machine learning paradigms, even with perfect data and model capacity, still persist in grappling with entrenched challenges that manifest in rudimentary forms; for instance, ``simple'' clustering with mixture models (based on maximum likelihood learning) suffers severely from bad local optima with an exponentially high probability. The marked discrepancy between the achievements of the two research strands gives rise to a question: what is the core element absent from conventional learning paradigms? To answer this question, we assume ideal setup for both data and model capacity and focus on the learning perspective to present the big cooperative learning. Specifically, big cooperative learning makes diverse use of the available (data or energy landscape) information to design massive cooperative training tasks, whose local optima are different but whose global optimum is the same; therefore, by randomly switching among such tasks, big cooperative learning destabilizes and thus conquers their local optima and concurrently encourages exploring the global optimum. Tailored mixture-model-based simulations on forward and reverse KL minimizations (representing the popular maximum likelihood and adversarial learning paradigms, respectively) demonstrate its general effectiveness across multiple paradigms in an explicit and controlled setup.

Primary Area: General Machine Learning

Keywords: Big cooperative learning, local optimum, global optimum, foundation models, clustering, forward KL minimization, reverse KL minimization

Submission Number: 6777

Loading