Keywords: Embodied AI, FAIR data, Healthcare robotics, Interoperability, Provenance, Safety assurance, Trustworthy AI, Voice biomarkers
TL;DR: We evaluate the FAIRness of voice biomarker datasets across health domains and show how data transparency and governance form a foundation for safety assurance in embodied health AI systems.
Abstract: Voice-based signals such as speech, breathing, and coughing form a primary perceptual channel for embodied health AI systems—from assistive robots and diagnostic companions to multimodal therapeutic agents. The safety and trustworthiness of such systems depend not only on model behavior but on the quality, traceability, and interoperability of the sensory data that drive their perception. Yet, current voice biomarker resources exhibit uneven data governance and limited adherence to the FAIR principles—Findability, Accessibility, Interoperability, and Reusability—creating hidden risks for safety assurance and regulatory compliance. In this work, we perform a comprehensive FAIRness evaluation of publicly available voice biomarker datasets spanning five major disease domains. Using a priority-weighted rubric aligned with the FAIR Data Maturity Model, we benchmark expert manual assessments against three automated tools (F-UJI, FAIR Evaluator, and FAIR-Checker) and quantify agreement and reliability through human–tool comparison and test–retest analysis. Results reveal strong Findability and Accessibility but persistent weaknesses in Interoperability and Reusability, notably in controlled vocabularies, qualified references, licensing, and provenance—elements essential for traceable and verifiable embodied perception pipelines. Automated tools often underestimated compliance due to rigid, domain-agnostic logic, underscoring the need for contextual evaluation. We introduce a data-level safety-assurance perspective, positioning FAIRness as a foundation for verifiable, policy-aligned embodied AI. By providing actionable recommendations—domain-specific metadata standards, machine-readable licensing, and FAIR-supportive repositories—we outline a practical pathway toward trustworthy, reproducible, and safe-assured embodied health AI systems.
Submission Number: 17
Loading