Your Consistency Model is Secretly a More Powerful Supervised Learning Paradigm for Learning Tasks with Complex Labels

Yang Li; Jiale Ma; Yebin Yang; Qitian Wu; Hongyuan Zha; Junchi Yan

Your Consistency Model is Secretly a More Powerful Supervised Learning Paradigm for Learning Tasks with Complex Labels

Yang Li, Jiale Ma, Yebin Yang, Qitian Wu, Hongyuan Zha, Junchi Yan

26 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Supervised Learning, Consistency Models

Abstract: Directly predicting labels from data inputs has been a long-standing supervised learning paradigm. Its trade-off between compression and prediction is studied under the information theory framework e.g. Information Bottleneck, especially in the context of deep learning. It typically assumes that the information content of labels is significantly less than that of data inputs, leading to model designs that prioritize compressing and extracting features from data inputs. In fact, recent supervised learning increasingly faces predicting complex labels, exacerbating the challenge of learning mappings from compressed latent features to high-fidelity label representations. Predictive bottlenecks emerge not only from compression limitations but also from the inherent complexity of feature-to-label transformations. This paper proposes incorporating scheduled label information into the model during training to better learn the prediction consistency mapping, which stems from the consistency mapping concept from generative consistency models. Unlike traditional approaches predicting labels directly from inputs, in this paper, the training of our designed conditional consistency involves predicting labels using inputs and noise-perturbed label hints, pursuing the predictive consistency across different noise steps. It simultaneously learns the relationship between latent features and a spectrum of label information from zero to complete, which enables progressive learning for complex predictions and allows multi-step inference analogous to gradual denoising, thereby enhancing the prediction quality. Experiments on vision, text, and graph tasks show the superiority of our consistency supervised training paradigm, over conventional supervised training in complex label prediction problems. Source code will be made publicly available upon acceptance.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 5746

Loading