Continuous Tactical Optimism and Pessimism
Abstract: The Tactical Optimism and Pessimism (TOP)[1] framework from reinforcement learning can be deployed in robotics by incorporating it into the decision-making process of an autonomous system. The TOP framework allows a robot to dynamically adjust its optimism or pessimism level based on the uncertainty in its environment and its level of risk tolerance. By being tactically optimistic, the robot takes exploratory actions to gather more information about its environment, which can be useful in unfamiliar or uncertain situations. Conversely, by being tactically pessimistic, the robot takes cautious actions to mitigate risks and avoid potentially dangerous situations. This framework enables the robot to strike a balance between gathering information and ensuring safety, thereby enhancing its decision-making capabilities in complex and uncertain real-world scenarios. TOP hypothesize that the efficacy of an optimistic strategy depends on the environment, the learning stage, and the overall context in which a learner is embedded. Therefore, they propose to view optimism/pessimism as a spectrum and investigate procedures that actively move along that spectrum during the learning process. TOP formulates the optimism/pessimism dilemma as a k−armed bandit problem. However, deciding arm values and the number of arms depends on each environment, and finding an optimal set of arms becomes more of a hyper- parameter search. In this work, we propose learning the degree of optimism/pessimism while the agent interacts online with the environment.
Article: pdf
2 Replies
Loading