Abstract: In reinforcement learning, employing temporal abstraction within the action space is a prevalent strategy for simplifying policy learning through temporally-extended actions.
Recently, algorithms that repeat a primitive action for a certain number of steps, a simple method to implement temporal abstraction in practice, have demonstrated better performance than traditional algorithms.
However, a significant drawback of earlier studies on action repetition is the potential for repeated sub-optimal actions to considerably degrade performance.
To tackle this problem, we introduce a new algorithm that employs ensemble methods to estimate uncertainty when extending an action.
Our framework offers flexibility, allowing policies to either prioritize exploration or adopt an uncertainty-averse stance based on their specific needs.
We provide empirical results on various environments, highlighting the superior performance of our proposed method compared to other action-repeating algorithms.
These results indicate that our uncertainty-aware strategy effectively counters the downsides of action repetition, enhancing policy learning efficiency.
Submission Number: 43
Loading