Abstract: This work is focused on advancing automatic scene analysis and ambient assisted living systems to support individuals requiring special care, such as the elderly or those visually impaired. The study explores the most effective techniques in Video Captioning and Object Detection, proposing a Deep Learning pipeline for Risks Assessment in home environments. Key elements include the integration of SwinBERT for Video Captioning and YOLOv7 for Object Recognition. Additionally, the effectiveness and limitations of the Risks Assessment pipeline are evaluated through various architectures, utilizing the Charades dataset, known for its natural and spontaneous depiction of household activities. The experimentation demonstrates how the integration of both models increases the results up to 7% in the Object Detection task, which is fundamental for the correct identification of potential risks. This comprehensive approach aims to develop more human-aligned and accurate systems for aiding vulnerable populations in their daily lives.
Loading