Abstract: Reinforcement learning techniques are increasingly utilized in cyber physical systems and traditional control systems, since they allow the controlling logic to learn through its interactions with its environment. However, reinforcement learning techniques have been found to be vulnerable to malicious influence, in the form of so-called adversarial examples, that can lead to, for example, destabilization of the system. In this paper, an optimization method is proposed to provide a directed attack towards a system resulting in destabilization. The attack differs from previous adversarial work against machine learning algorithms in that it focused on cyber physical systems and, in contrast to false-data injection or actuator attacks, assumed that an adversary is able to directly influence the state(s) of the system, to some degree. Furthermore, it is assumed that the system is controlled using a pre-learned optimal policy; i.e., the attack does not poison the learning process but rather leverages imperfections in the learned policy. This means the reinforcement learning algorithm can be vulnerable even while operating under an optimal policy. The optimization approach increases the feasibility of the attack by reducing the overall cost expended by the adversary. This paper describes the theory supporting the attack by proposing an algorithm and its corresponding proof. The attack is validated using OpenAI's gym and the physics simulator Mujoco to simulate the attack on a cyber physical system trained using a deep reinforcement learning method.
Loading