Keywords: Reinforcement Learning, Immersion hot water heater, optimizing energy management
TL;DR: Deadline-aware PPO cuts immersion-heater energy 26–69% vs bang-bang/MCTS (3.23 kWh at 2 h) under identical physics.
Abstract: Typical domestic Immersion water heater systems are always turned on during the winter, it heats quickly rather than efficiently and ignores predictable demand windows and ambient losses. We study deadline-aware control, where the aim is to reach a target temperature at a specified time while minimising energy. We introduce an efficient Gymnasium environment that models an immersion hot-water heater with first-order thermal losses and discrete on and off actions ${0, 6000}$ W applied every 120 s. Methods include a time-optimal bang-bang baseline, a zero-shot Monte Carlo Tree Search planner, and a Proximal Policy Optimization policy. We report total energy (Wh) under identical physics. Across sweeps of initial temperature (10–30 °C), deadline (30–90 steps), and target temperature (40–80 °C), PPO achieves the most energy-efficient performance at a 60-step horizon (2 h) it uses 3.23 kWh, versus bang-bang’s 4.37–10.45 kWh and MCTS’s 4.18–6.46 kWh, yielding savings of 26\% at 30 steps and 69\% at 90 steps. In a representative trajectory (50 kg, 20 °C ambient, 60 °C target), PPO consumes 54\% less energy than bang-bang and 33\% less than MCTS. These results show that learned, deadline-aware control reduces energy under identical physics where planners provide partial savings without training, while policies offer near-zero-cost inference once trained.
Submission Number: 31
Loading