Keywords: In-context reinforcement learning, Value uncertainty estimation, Thompson sampling, HVAC control, Zero-shot transfer, Building automation, Energy efficiency, Ensemble learning, Transformers, Reinforcement Learning
TL;DR: SPICE enables zero-shot HVAC control by combining value uncertainty and Thompson sampling with in-context learning, eliminating per-building training while maintaining energy efficiency and comfort.
Abstract: Urban buildings consume 40\% of global energy, yet most rely on inefficient rule-based HVAC systems due to the impracticality of deploying advanced controllers across diverse building stock. In-context reinforcement learning (ICRL) offers promise for rapid deployment without per-building training, but standard supervised learning objectives that maximise likelihood of training actions inherit behaviour-policy bias and provide weak exploration under the distribution shifts common when transferring across buildings and climates. We present SPICE (Sampling Policies In-Context with Ensemble uncertainty), a novel ICRL method specifically designed for zero-shot building control that addresses these fundamental limitations. SPICE introduces two key methodological innovations: (i) a propensity-corrected, return-aware training objective that prioritises high-advantage, high-uncertainty actions to enable improvement beyond suboptimal training demonstrations, and (ii) lightweight value ensembles with randomised priors that provide explicit uncertainty estimates for principled episode-level Thompson sampling. At deployment, SPICE samples one value head per episode and acts greedily, resulting in temporally coherent exploration without test-time gradients or building-specific models. We establish a comprehensive experimental protocol using the HOT dataset to evaluate SPICE across diverse building types and climate zones, focusing on the energy efficiency, occupant comfort, and zero-shot transfer capabilities that are critical for urban-scale deployment.
Submission Number: 62
Loading