Focusing on What Matters: Fine-grained Medical Activity Recognition for Trauma Resuscitation via Actor Tracking

Published: 01 Jan 2024, Last Modified: 06 Mar 2025CVPR Workshops 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Trauma is a leading cause of mortality worldwide, with about 20% of these deaths being preventable. Most of these preventable deaths result from errors during the initial resuscitation of injured patients. Decision support has been evaluated as an approach to support teams during this phase to reduce errors. Existing systems require manual data entry and monitoring, which makes tasks challenging to accomplish in a time-critical setting. This paper identified the specific challenges of achieving effective decision support in trauma resuscitation based on computer vision techniques, including complex backgrounds, crowded scenes, fine-grained activities, and a scarcity of labeled data. To address the first three challenges, the proposed system involved an actor tracker that identifies individuals, allowing the system to focus on actor-specific features. Video Masked Autoencoder (Video-MAE) was used to overcome the issue of insufficient labeled data. This approach enables self-supervised learning using unlabeled video content, improving feature representation for medical activities. For more reliable performance, an ensemble fusion method was introduced. This technique combines predictions from consecutive video clips and different actors. Our method outperformed existing approaches in identifying fine-grained activities, providing a solution for activity recognition in trauma resuscitation and similar complex domains.
Loading