GLARE: Scalable Neuro-Symbolic Reward Shaping for LLM Agents via Group-Level Automata

Published: 30 Apr 2026, Last Modified: 24 Jun 2026ICML 2026 regularEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: A neuro-symbolic RL training framework that synergizes the semantic understanding of LLMs with the deterministic precision of LTL automata.
Abstract: Reinforcement Learning (RL) with Group Relative Policy Optimization (GRPO) shows great promise for enhancing LLM reasoning, but remains challenged by sparse and unstable rewards in long-horizon tasks. Existing approaches to reward shaping struggle to balance semantic expressiveness, reliability, and computational efficiency: heuristic rules lack flexibility, while LLM-as-a-Judge incurs high computational cost and suffer from inconsistent and misaligned scoring signals in long-context settings. To address these challenges, we introduce GLARE, a neuro-symbolic reward framework that decouples semantic abstraction from credit assignment. Specifically, to leverage semantic understanding while preserving symbolic determinism, we first extract and symbolize trajectory events into a discrete representation. These events are then translated into Linear Temporal Logic (LTL) formulas, which are compiled into deterministic automata that track the agent's progress via state transitions. This mechanism yields dense and consistent reward signals, avoiding unstable direct scoring while significantly reducing computational cost. Empirical results on ALFWorld show that GLARE outperforms GRPO by 12.1\% in success rate, while achieving an 8.1\% improvement over conventional LLM-based judges using only 15\% of their computational cost.
Lay Summary: Large language models are increasingly used as agents that act in interactive environments, but they are hard to train when feedback only arrives after a long task is finished. Our paper introduces GLARE, a method that converts an agent’s behavior into simple event descriptions and uses rule-like structures to give more informative feedback during the task. This makes the feedback more consistent than asking another language model to judge every action directly, while also reducing computation cost. In experiments on household and web-shopping tasks, GLARE helps smaller language-model agents learn better strategies than standard training methods and direct language-model judging baselines.
Originally Submitted Supplementary Material: zip
Link To Code: https://github.com/USTC-AIR-Lab/GLARE.git
Primary Area: Deep Learning->Large Language Models
Keywords: Large Language Model, Reinforcement Learning, LLM as Judge, Linear Temporal Logic
Originally Submitted PDF: pdf
Submission Number: 34609
Loading