SafetyChat: Learning to Generate Physical Safety Warnings in Instructional Assistants

SafetyChat: Learning to Generate Physical Safety Warnings in Instructional Assistants

ICLR 2026 Conference Submission18539 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Physical Safety, Instructional AI Assistant, LLM

TL;DR: A new physical safety task for LLM chat assistant, a new dataset, and strong alignment results.

Abstract: While large language models (LLMs) excel in language generation and conversational abilities, their broader utility hinges on meeting additional requirements to ensure reliability and safety. Recent research has explored areas such as minimizing hallucinations, grounding outputs in credible sources, and safeguarding user privacy. However, the critical aspect of physical safety has received limited attention—an oversight that becomes increasingly important as LLMs are integrated into multimodal voice assistants (e.g., smart glasses) that are capable of guiding users through complex, safety-critical tasks such as automotive repair. In this work, we investigate the limitations of current LLMs in generating effective and contextually appropriate safety warnings in the context of complex repair tasks. We introduce SafetyChat, a multi-domain dataset that can evaluate LLMs’ ability to model and prioritize safety awareness. We further enhance model alignment by post-training on this data, comparing the performance of various techniques. Through this process, we identify key challenges and establish robust baselines, paving the way for future research on integrating physical safety considerations into LLM-driven instructional systems. We will release data and code to reproduce our results on publication.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 18539

Loading