Nonparametric Identification of Latent Concepts

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We establish theoretical guarantees for learning latent concepts in general settings, without assuming specific concept types, functional relations, or parametric generative models.
Abstract: We are born with the ability to learn concepts by comparing diverse observations. This helps us to understand the new world in a compositional manner and facilitates extrapolation, as objects naturally consist of multiple concepts. In this work, we argue that the cognitive mechanism of comparison, fundamental to human learning, is also vital for machines to recover true concepts underlying the data. This offers correctness guarantees for the field of concept learning, which, despite its impressive empirical successes, still lacks general theoretical support. Specifically, we aim to develop a theoretical framework for the identifiability of concepts with multiple classes of observations. We show that with sufficient diversity across classes, hidden concepts can be identified without assuming specific concept types, functional relations, or parametric generative models. Interestingly, even when conditions are not globally satisfied, we can still provide alternative guarantees for as many concepts as possible based on local comparisons, thereby extending the applicability of our theory to more flexible scenarios. Moreover, the hidden structure between classes and concepts can also be identified nonparametrically. We validate our theoretical results in both synthetic and real-world settings.
Lay Summary: Imagine how we learn about the world as children. We see many different things – a fluffy cat, a furry dog, a feathered bird – and by comparing them, we start to understand underlying concepts like "animal," "furry," or "has wings." This ability to compare diverse examples helps us make sense of new things we've never encountered before. This paper argues that a similar process of comparison is crucial for computers to truly learn the basic concepts hidden in data. While current computer learning methods are powerful, they often lack a solid guarantee that they're learning the right concepts. Our work provides a new understanding of how computers can reliably identify these hidden concepts. We show that if a computer system is fed enough varied examples across different categories, it can pinpoint the fundamental concepts without needing to be told beforehand what types of concepts to look for or how they are connected. Even when the data is not perfectly diverse, our approach can still identify as many concepts as possible by making local comparisons. Furthermore, this method can also uncover the natural relationships between different categories of data and the concepts they represent. We've tested these ideas and confirmed they work, both in controlled experiments and with real-world datasets.
Primary Area: General Machine Learning->Representation Learning
Keywords: Concept Learning, Compositional Learning, Nonparametric Identification
Submission Number: 2212
Loading