Curate, Connect, Inquire: A System for Findable Accessible Interoperable and Reusable (FAIR) Human-Robot Centered Datasets

Published: 07 May 2025, Last Modified: 07 May 2025ICRA Workshop Human-Centered Robot LearningEveryoneRevisionsBibTeXCC BY 4.0
Workshop Statement: Our work presents a practical system that enhances the human-centered learning pipeline in robotics by addressing a foundational yet often overlooked aspect: data curation and accessibility. The system integrates standardized metadata templates, a robotics-specific data model, a semantic knowledge graph, and a large language model (LLM)-powered conversational interface to make datasets Findable, Accessible, Interoperable, and Reusable (FAIR). These tools facilitate the responsible publication, discovery, and reuse of human-robot interaction (HRI) datasets, which are often multimodal, large-scale, and derived from ethically complex experimental settings involving human subjects. By enabling users to query and compare datasets through natural language—such as “What robot model was used in the Vid2Real real-world study?” or “Which datasets involve autonomous navigation?”—our system introduces a human-centered approach to accessing machine-centered data. Through this, we contribute to embodied AI and robot learning by reducing friction in dataset discovery, enhancing reproducibility, and supporting interdisciplinary collaboration. Our efforts align closely with the goals of HCRL, especially in promoting transparency, data ethics, and research equity in robot learning systems. The inclusion of human subject protocols, behavioral metrics, and sensor details ensures that both social context and technical performance are available for learning tasks, supporting a more holistic and socially-aware robot learning ecosystem.
Keywords: Human-Centered Robot Learning, Dataset Curation, Robotics Metadata Standards, Human-Robot Interaction
TL;DR: We present a system that makes human-robot interaction datasets FAIR and accessible through standardized curation, a knowledge graph, and a natural language chatbot.
Abstract: The rapid growth of AI in robotics has amplified the need for high-quality, reusable datasets, particularly in human robot interaction (HRI) and AI-embedded robotics. While more robotics datasets are being created, the landscape of open data in the field is uneven. This is due to lack of curation standards and consistency in publication practices, which makes finding, understanding, accessing, and reusing existing robotics data difficult. To address these challenges, we introduce a curation and access system developed through our experience curating and publishing datasets with researchers from Texas Robotics. The system integrates a data reporting template, a domain-specific knowledge graph, and a ChatGPT-powered conversational interface that enables users to explore, compare, and access robotics datasets published in an institutional data repository. The system's evaluation demonstrated that it supports consistent and correct information about, and access to data, emphasizing the importance of curation to enhance Fairness (Findability, Accessibility, Interoperability, and Reusability) of human-centered robotics datasets. Importantly, the best practices developed in this work can inform the community how to curate and publish robotics datasets. This work directly aligns with the goals of the HCRL @ ICRA 2025 workshop and represents a step towards more human-centered access to data for embodied AI.
Submission Number: 18
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview