Neuro‑Symbolic Data Collection Automata for Training Language Models on Edge Devices

Published: 29 Aug 2025, Last Modified: 29 Aug 2025NeSy 2025 - Phase 2 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: language models, edge computing, low-resource computing, neuro-symbolic artificial intelligence
TL;DR: Untrained neural language models scaled down to microprocessors use data collected by a finite state machines to train a voice-controlled smart lamp from verbal commands and button pushes.
Abstract: Language models (LMs) have achieved significant success in centralized settings, but their utility in localized, real-time applications on edge devices remains constrained. These environments—where direct interaction between users and devices occurs—lack the vast training resources available to general-purpose cloud-based models. The typical development pipeline for LMs involves (1) large-scale unsupervised pretraining to develop generalist behaviors before (2) supervised fine-tuning on small, task-specific datasets. The second step remains a bottleneck for edge deployment, as it requires labeled data, which is rarely avail- able or easily collected in situ. We address this challenge by introducing a neuro-symbolic framework for data collection and learning on edge devices. At the core of our approach is a finite-state machine (FSM), called a Data Collection Automaton (DCA), that supervises an LM through interaction with the environment. This FSM enables automatic labeling of user inputs by tracking conversational and physical interactions, transforming them into usable training data. Our implementation focuses on a voice-controlled smart lamp that learns from its user without external data—only through spoken commands and switch toggles.
Track: Main Track
Paper Type: Industry Abstract
Resubmission: No
Changes List: Additional details for each concept introduced in the overview of the methods are provided to make the submission self-contained.
Publication Agreement: pdf
Submission Number: 92
Loading