Abstract: With the aging population, health and senior care are becoming to be crucial issues for the whole world. Because the number of healthcare professionals is far from fulfilling increasing patients’ and seniors’ needs, seeking services from non-professional healthcare staff, such as home caregivers, is indispensable. Methods that support locating video moments with natural language queries can improve the normalization of operations for the non-professional healthcare staff and reduce their time expenditure on specific action moment retrieval. Addressing this problem, we propose a cross-modal neural network model for effective health and senior care video localization. Our model learns procedures in the video reference and uses procedure knowledge to improve the model’s localization performance. We conduct experiments on a dataset for health and senior care video localization and an open-accessible dataset about medical instruction. Experiment results show procedure knowledge can remarkably improve the model’s capacity for video moment localization. We hope our dataset and method could promote the development of cross-modal research and application for health and senior care.
Loading