DomusMind: A Benchmark for Evaluating Lifelong Smart Home Agents Under Drift

Published: 02 Mar 2026, Last Modified: 10 Apr 2026LLA 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Theory of Mind, Lifelong agent, Benchmark, Smart Home, Long-horizon evaluation, Continual alignment, Online adaptation
TL;DR: DomusMind benchmarks smart-home agents under changing user preferences and unreliable devices. Agents do best when they keep updating their user model and ask for confirmation when uncertain.
Abstract: Smart home agents require continuous operation in non-stationary environments where human preferences and device reliability keep evolving. However, dominant evaluation protocols remain episodic and reset-based, failing to capture the degradation and recovery dynamics essential for long-term deployment. To address this gap, we introduce DomusMind, a benchmark for evaluating lifelong agents under two sources of non-stationarity: preference drift and tool drift. DomusMind instantiates a persistent smart-home control loop where agents balance autonomous execution and user burden. By tracking time-resolved metrics across preference, tool, and mixed drift scenarios, our results show that online Theory of Mind (ToM) with uncertainty-gated confirmation provides the most robust adaptation. Notably, ORACLE persona access alone does not eliminate failures under tool drift, identifying execution reliability as a distinct bottleneck. By sweeping a confirmation threshold, DomusMind characterizes a success–annoyance frontier that enables principled selection of operating points for long-horizon alignment.
Submission Number: 225
Loading