Keywords: Physical Safety, Instructional AI Assistant, LLM
TL;DR: A new physical safety task for LLM chat assistant, a new dataset, and strong alignment results.
Abstract: While large language models (LLMs) excel in language generation and conversational abilities, their broader utility hinges on meeting additional requirements to ensure reliability and safety. Recent research has explored areas such as minimizing hallucinations, grounding outputs in credible sources, and safeguarding user privacy. However, the critical aspect of physical safety has received limited attention—an oversight that becomes increasingly important as LLMs are integrated into multimodal voice assistants (e.g., smart glasses) that are capable of guiding users through complex, safety-critical tasks such as automotive repair. In this work, we investigate the limitations of current LLMs in generating effective and contextually appropriate safety warnings in the context of complex repair tasks. We introduce SafetyChat, a multi-domain dataset that can evaluate LLMs’ ability to model and prioritize safety awareness. We further enhance model alignment by post-training on this data, comparing the performance of various techniques. Through this process, we identify key challenges and establish robust baselines, paving the way for future research on integrating physical safety considerations into LLM-driven instructional systems. We will release data and code to reproduce our results on publication.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 18539
Loading