An Improved Neuro-Symbolic Architecture to Fine-Tune Generative AI Systems

Published: 01 Jan 2024, Last Modified: 09 Dec 2024CPAIOR (2) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Deep generative models excel at replicating the mechanisms that generate a specific set of sequential data. However, learning the underlying constraints preventing the generation of forbidden sequences poses a challenge. Recently, RL-Tuner, a reinforcement learning framework designed for the ad hoc fine-tuning of a neural model to adhere to given constraints, was enhanced to learn from the output of two constraint programming models. The first model computes a score representing the number of constraint violations from the currently generated token while the second model provides the marginal probability of that token being generated if no additional violation is allowed. In this paper, we significantly enhance the latter framework in three ways. First, we propose a simplified architecture that requires only a single constraint programming model. Second, we evaluate constraint violations in a more accurate and consistent manner. Third, we propose a reward signal based on belief propagation on this new model that further improves performance. Our experiments, conducted on the same learning task of music generation, demonstrate that our approach surpasses the previous framework both in terms of convergence speed during training and in post-training accuracy. Additionally, our approach exhibits superior generalization to longer sequences than those used during training.
Loading