everyone
since 09 May 2025">EveryoneRevisionsBibTeXCC BY 4.0
Conventional reinforcement learning (RL) methods often fix a single discount factor for future rewards, limiting their ability to handle diverse temporal requirements. We propose a framework that trains an agent across a spectrum of discount factors—interpreting each value function as a sample of a Laplace transform—then applies an inverse transform to recover a log-compressed representation of expected future reward. This representation enables post hoc adjustments to the discount function (e.g., exponential, hyperbolic, or finite horizon) without retraining. Furthermore, by precomputing a library of policies, the agent can dynamically select which policy maximizes a newly specified discount objective at runtime, effectively constructing a hybrid policy in environments with shifting deadlines or reward structures. The log-compressed timeline aligns with human temporal perception, as described by the Weber-Fechner law, maintaining uniform relative precision across timescales thus enhancing efficiency in scale-free environments. We demonstrate this framework in a grid-world navigation task, where the agent adapts to varying time horizons.