Abstract: Highlights•We propose an end-to-end approach with a novel temporal-spatial pooling block (named STP) for action classification, which can learn pool discriminative frames and pixels in a certain clip. Our method achieves better performance than other state-of-the-art methods.•We propose a STP loss function, aiming to learn a sparse importance score in the temporal dimension, abandoning the redundant or invalid frames.•We present a ferryboat video database (named Ferryboat-4) for ferry action recognition. The database includes four action categories: Inshore, Offshore, Traffic, and Negative. We evaluate proposed STP and other state-of-the-art models on this database.
Loading