Abstract: The current Neuro-Symbolic (NeSy) Learning paradigm suffers from an over-reliance on labeled data, so if we completely disregard labels, it leads to less symbol information, a larger solution space, and more shortcuts—issues that current Nesy systems cannot resolve. This paper introduces a novel learning paradigm, Verification Learning (VL), which addresses this challenge by transforming the label-based reasoning process in Nesy into a label-free verification process. VL achieves excellent learning results solely by relying on unlabeled data and a function that verifies whether the current predictions conform to the rules. We formalize this problem as a Constraint Optimization Problem (COP) and propose a Dynamic Combinatorial Sorting (DCS) algorithm that accelerates the solution by reducing verification attempts, effectively lowering computational costs and introduce a prior alignment method to address potential shortcuts. Our theoretical analysis points out which tasks in Nesy systems can be completed without labels and explains why rules can replace infinite labels for some tasks, while for others the rules have no effect. We validate the proposed framework through several fully unsupervised tasks including addition, sort, match, and chess, each showing significant performance and efficiency improvements.
Lay Summary: Current neuro-symbolic systems typically require a large amount of annotated data to train models. However, in real-world scenarios, acquiring annotations for reasoning tasks is often challenging. This paper proposes Verification Learning (VL), a novel learning paradigm that significantly alleviates issues associated with limited information, large symbolic spaces, and the prevalence of shortcuts in the absence of supervision. The VL problem is formulated as an optimization problem, which can be efficiently solved using a low-complexity ranking algorithm under specific conditions. To further address the shortcut issue, we introduce a distribution alignment strategy. On the theoretical side, we analyze the errors introduced by knowledge and data in unsupervised settings. Experimentally, we demonstrate the superiority of the VL paradigm on four benchmark datasets.
Primary Area: General Machine Learning->Unsupervised and Semi-supervised Learning
Keywords: Neural Symbolic Learning
Submission Number: 12876
Loading