ActionSense: A Multimodal Dataset and Recording Framework for Human Activities Using Wearable Sensors in a Kitchen Environment

Joseph DelPreto; Chao Liu; Yiyue Luo; Michael Foshey; Yunzhu Li; Antonio Torralba; Wojciech Matusik; Daniela Rus

ActionSense: A Multimodal Dataset and Recording Framework for Human Activities Using Wearable Sensors in a Kitchen Environment

Joseph DelPreto, Chao Liu, Yiyue Luo, Michael Foshey, Yunzhu Li, Antonio Torralba, Wojciech Matusik, Daniela Rus

Published: 17 Sept 2022, Last Modified: 23 May 2023NeurIPS 2022 Datasets and Benchmarks Readers: Everyone

Keywords: Wearable sensors, multimodal dataset, multimodal recording, activities of daily living, kitchen activities, robot assistants, machine learning, neural networks, learning pipelines, human subjects, experimental design, open-source, recording software, motion tracking, body tracking, joint angles, eye tracking, gaze, attention, tactile sensing, muscle activity, EMG, video, depth, RGBD, cameras, audio, microphones

Abstract: This paper introduces ActionSense, a multimodal dataset and recording framework with an emphasis on wearable sensing in a kitchen environment. It provides rich, synchronized data streams along with ground truth data to facilitate learning pipelines that could extract insights about how humans interact with the physical world during activities of daily living, and help lead to more capable and collaborative robot assistants. The wearable sensing suite captures motion, force, and attention information; it includes eye tracking with a first-person camera, forearm muscle activity sensors, a body-tracking system using 17 inertial sensors, finger-tracking gloves, and custom tactile sensors on the hands that use a matrix of conductive threads. This is coupled with activity labels and with externally-captured data from multiple RGB cameras, a depth camera, and microphones. The specific tasks recorded in ActionSense are designed to highlight lower-level physical skills and higher-level scene reasoning or action planning. They include simple object manipulations (e.g., stacking plates), dexterous actions (e.g., peeling or cutting vegetables), and complex action sequences (e.g., setting a table or loading a dishwasher). The resulting dataset and underlying experiment framework are available at https://action-sense.csail.mit.edu. Preliminary networks and analyses explore modality subsets and cross-modal correlations. ActionSense aims to support applications including learning from demonstrations, dexterous robot control, cross-modal predictions, and fine-grained action segmentation. It could also help inform the next generation of smart textiles that may one day unobtrusively send rich data streams to in-home collaborative or autonomous robot assistants.

Author Statement: Yes

TL;DR: A multimodal dataset and recording framework use wearable sensors and synchronized ground-truth data to record humans performing kitchen tasks, with the goal of enabling insights into manipulation, task planning, and more capable robot assistants.

URL: https://action-sense.csail.mit.edu

Open Credentialized Access: N/A

Dataset Url: https://action-sense.csail.mit.edu

Dataset Embargo: N/A

License: Creative Commons: a CC BY-NC-SA 4.0 license. Code is open-source under an MIT License.

Supplementary Material: pdf

Contribution Process Agreement: Yes

In Person Attendance: Yes

15 Replies

Loading