Early Stopping for Two-Stream Fusion Applied to Action RecognitionOpen Website

2020 (modified: 05 Nov 2022)VISIGRAPP (Revised Selected Papers) 2020Readers: Everyone
Abstract: Various information streams, such as scene appearance and estimated movement of objects involved, can help in characterizing actions in videos. These information modalities perform better in different scenarios and complementary features can be combined to achieve superior results compared to the individual ones. As important as the definition of representative and complementary feature streams is the choice of proper combination strategies that explore the strengths of each aspect. In this work, we analyze different fusion approaches to combine complementary modalities. In order to define the best parameters of our fusion methods using the training set, we have to reduce overfitting in individual modalities, otherwise, the 100%-accurate outputs would not offer a realistic and relevant representation for the fusion method. Thus, we analyze an early stopping technique for training individual networks. In addition to reducing overfitting, this method also reduces the training cost, since it usually requires fewer epochs to complete the classification process. Experiments are conducted on UCF101 and HMDB51 datasets, which are two challenging benchmarks in the context of action recognition.
0 Replies

Loading