Abstract: Reinforcement learning (RL) has emerged as a strong candidate for implementing complex controls in energy systems, such as energy pricing in microgrids. But what happens when some of the microgrid controllers are compromised by a malicious entity? We demonstrate a novel attack in RL. Our attack perturbs each trajectory to reverse the direction of the estimated gradient. We demonstrate that if data from a small fraction of microgrid controllers is adversarially perturbed, the learning of the RL agent can be significantly slowed or (with larger perturbations) caused to operate at a loss. Prosumers also face higher energy costs, use their batteries less, and suffer from higher peak demand when the pricing aggregator is adversarially poisoned. We address this vulnerability with a “defense” module; i.e., a ``robustification'' of RL algorithms against this attack. Our defense identifies the trajectories with the largest influence on the gradient and removes them from the training data.