Embodied Safety Alignment: Combining RLAF and Rule-Based Memory for Trustworthy Robots

06 Nov 2025 (modified: 27 Nov 2025)Submitted to E-SARSEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Vision–Language Models (VLMs); Reinforcement Learning from Action Feedback (RLAF); Contrastive Learning; Human–Robot Trust; Safety Alignment; Multimodal Integration; Prompt-Based Reward Modeling; Rule-Based Governance; Safety Memory; Explainable Robotics
TL;DR: This paper introduces a unified framework that aligns vision, language, and action through reflective reinforcement learning and rule-based safety memory to ensure trustworthy, safe, and interpretable human-robot interaction.
Abstract: Ensuring both human trust and robotic safety remains a central challenge in deploying embodied AI systems to real-world environments. We present a unified framework that integrates vision–language–action alignment, reinforcement learning from action feedback (RLAF), and rule-based safety memory for interpretable, self-improving human–robot interaction. Our approach first learns visual trust and safety representations from synthetic, prompt-generated data using contrastive learning with counterfactual captions, enabling reasoning about confidence, hesitation, and risk purely from visual input. A language-driven alignment module then employs large vision–language models (VLMs) to generate explanations, evaluate decisions through reflective prompting, and provide continuous feedback rewards for RLAF optimization. To guarantee safe operation, a safety critic and shield mechanism constrain actions within verified physical limits, while a persistent safety memory maintains hierarchical rules, adaptive sub-roles, and an updateable safety manual ensuring accountability and continual adaptation. Together, these components establish a safety-assured, multimodal architecture that perceives, reasons, and communicates trust—offering a path toward transparent and reliable embodied intelligence.
Submission Number: 5
Loading