Optimizing Large Language Models Assisted Smart Home Assistant Systems at the Edge: An Empirical Study

23 Nov 2024 (modified: 30 Dec 2024)AAAI 2025 Workshop AI4WCN SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Smart Home Automation, Home Assistant, Large Language Model, Fine-Tuning
TL;DR: In this paper, we present a pilot study to evaluate the real-time performance of the edge deployment of LLMs on practical resource-constrained devices.
Abstract: The last decade has witnessed the widespread adoption of AI-assisted smart home applications on the network edge, supported by improvements in edge device hardware accelerations and AI computing algorithms. Particularly, the surge of Large Language Models (LLMs) in 2022 pushes smart home applications to handle more complicated and multiple tasks, such as chat-bots, video surveillance, signal sensing, voice controls, etc. However, new changes have appeared in response precisions, delays, and power consumption with limited computation power and resources when utilizing LLM services in resource-constrained edge environments. To this end, in this study, we develop a testbed to evaluate the efficacy and latency of real-time responses and actions from on-device models directly in smart home environments. Based on it, we leverage lightweight and fine-tuned LLMs optimized for seamless integration with benchmark home assistant systems, a popular open-source platform for smart home automation, on resource-constrained edge devices like Raspberry Pis. Furthermore, we optimize the search engines for configured devices in system configurations, shortening the response delay further. In our evaluation, we have utilized four models to evaluate their real-time on-device performance, including a pre-trained model (serving as our baseline), e.g., the Home-1B model, and three customized and fine-tuned models, e.g., TinyHome, TinyHome-Qwen, and StableHome, based on a medium-sized synthetic smart home dataset tailored to smart home environments. Evaluation results show that our optimized models maintain high accuracy in understanding and executing user commands. More importantly, with optimizations, we reduce the response time by around 82%, from originally 45.1 seconds to 7.9 seconds, on average, for four models. Our demo video can be reached with the link: https://youtu.be/zukPKLNWR54.
Submission Number: 6
Loading