Abstract: With the advent of deep reinforcement learning, we witness the spread of novel autonomous driving agents that learn how to drive safely among humans. However, skilled attackers might steer the decision-making process of these agents through minimal perturbations applied to the readings of their hardware sensors. These force the behavior of the victim agent to change unexpectedly, increasing the likelihood of crashes by inhibiting its braking capability or coercing it into constantly changing lanes. To counter these phenomena, we propose a detector that can be mounted on autonomous driving cars to spot the presence of ongoing attacks. The detector first profiles the agent's behavior without attacks by looking at the representation learned during training. Once deployed, the detector discards all the decisions that deviate from the regular driving pattern. We empirically highlight the detection capabilities of our work by testing it against unseen attacks deployed with increasing strength.
Loading