Hierarchical Feedback Interface for Human-in-the-Loop Reinforcement Learning in Debugging

ICLR 2026 Conference Submission25378 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Reinforcement Learning in Debugging
Abstract: We propose Hierarchical Feedback Interface (HFI) for human-in-the-loop reinforcement learning in debugging which structures human feedback grouped into high level objectives and low level refinements to cover the subjectivity and inefficaciousness of ad-hoc corrections. The HFI employs a two-tiered policy architecture, in which a high-level policy abstracts debugging goals into ac a interpretable meta-objectives, and a low-level policy translates these into actionable feedback thus grounding human input to the ALigned-and-goal reasoning. The framework integrates a hierarchical actor-critic mechanism - with the high-level policy generating goal vectors over reduced state representations, while the low level policy conditions of both code specific features and these goals to generate context-aware feedback.
Primary Area: transfer learning, meta learning, and lifelong learning
Submission Number: 25378
Loading