OctoNet: A Large-Scale Multi-Modal Dataset for Human Activity Understanding Grounded in Motion-Captured 3D Pose Labels

Dongsheng Yuan; Xie Zhang; Weiying Hou; Sheng Lyu; Yuemin Yu; Luca Jiang-Tao Yu; Chengxiao Li; Chenshu Wu

OctoNet: A Large-Scale Multi-Modal Dataset for Human Activity Understanding Grounded in Motion-Captured 3D Pose Labels

Dongsheng Yuan, Xie Zhang, Weiying Hou, Sheng Lyu, Yuemin Yu, Luca Jiang-Tao Yu, Chengxiao Li, Chenshu Wu

Published: 18 Sept 2025, Last Modified: 30 Oct 2025NeurIPS 2025 Datasets and Benchmarks Track posterEveryoneRevisionsBibTeXCC BY-NC 4.0

Keywords: multi-modal dataset, human activity understanding, human pose estimation, non-intrusive, wireless sensing

TL;DR: A multimodal human activity and pose dataset for diversify sensing including acoustic, RF-based, vision-based, inertial, and physiological modalities

Abstract: We introduce OctoNet, a large-scale, multi-modal, multi-view human activity dataset designed to advance human activity understanding and multi-modal learning. OctoNet comprises 12 heterogeneous modalities (including RGB, depth, thermal cameras, infrared arrays, audio, millimeter-wave radar, Wi-Fi, IMU, and more) recorded from 41 participants under multi-view sensor setups, yielding over 67.72M synchronized frames. The data encompass 62 daily activities spanning structured routines, freestyle behaviors, human-environment interaction, healthcare tasks, etc. Critically, all modalities are annotated by high-fidelity 3D pose labels captured via a professional motion-capture system, allowing precise alignment and rich supervision across sensors and views. OctoNet is one of the most comprehensive datasets of its kind, enabling a wide range of learning tasks such as human activity recognition, 3D pose estimation, multi-modal fusion, cross-modal supervision, and sensor foundation models. Extensive experiments have been conducted to demonstrate the sensing capacity using various baselines. OctoNet offers a unique and unified testbed for developing and benchmarking generalizable, robust models for human-centric perceptual AI.

Croissant File: json

Dataset URL: https://huggingface.co/datasets/hku-aiot/OctoNet

Code URL: https://github.com/aiot-lab/OctoNet/tree/main

Supplementary Material: pdf

Primary Area: Other (please use sparingly, only use the keyword field for more details)

Submission Number: 894

Loading