Keywords: Wearable sensors, multimodal dataset, multimodal recording, activities of daily living, kitchen activities, robot assistants, learning pipelines, motion tracking, body tracking, eye tracking, gaze, tactile sensing, muscle activity, EMG, cameras, microphones
TL;DR: A multimodal dataset and recording framework use wearable sensors and synchronized ground-truth data to record humans performing kitchen tasks, with the goal of enabling insights into manipulation, task planning, and more capable robot assistants.
Abstract: This paper introduces ActionNet, a multimodal dataset and recording framework with an emphasis on wearable sensing in a kitchen environment. It provides rich, synchronized data streams along with ground truth data to facilitate learning pipelines that could extract insights about how humans interact with the physical world during activities of daily living, and help lead to more capable and collaborative robot assistants. The wearable sensing suite captures motion, force, and attention information; it includes eye tracking with a first-person camera, forearm muscle activity sensors, a body-tracking system using 17 inertial sensors, finger-tracking gloves, and custom tactile sensors on the hands that use a matrix of conductive threads. This is coupled with activity labels and with externally-captured data from multiple RGB cameras, a depth camera, and microphones. The specific tasks recorded in ActionNet are designed to highlight lower-level physical skills and higher-level scene reasoning or action planning. They include simple object manipulations (e.g., stacking plates), dexterous actions (e.g., peeling or cutting vegetables), and complex action sequences (e.g., setting a table or loading a dishwasher). The resulting dataset and underlying experiment framework are available at https://action-net.csail.mit.edu. Preliminary networks and analyses explore modality subsets and cross-modal correlations. ActionNet aims to support applications including learning from demonstrations, dexterous robot control, cross-modal predictions, and fine-grained action segmentation. It could also help inform the next generation of smart textiles that may one day unobtrusively send rich data streams to in-home collaborative or autonomous robot assistants.
Supplementary Material: pdf
Open Credentialized Access: N/A
Dataset Url: https://action-net.csail.mit.edu
Dataset Embargo: N/A
License: Creative Commons; in particular we plan on using a CC BY-NC-SA 4.0 license
Author Statement: Yes
Contribution Process Agreement: Yes
In Person Attendance: Yes