TROLL: Trust Regions improve Reinforcement Learning for Large Language Models

Philipp Becker, Niklas Freymuth, Serge Thilges, Fabian Otto, Gerhard Neumann

Published: 2025, Last Modified: 05 May 2026CoRR 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Loading