Abstract: Highlights•We develop a novel spatio-temporal CNN architecture for feature learning from video frames.•CRF is coupled with CNN to achieve feature learning and structured prediction simultaneously.•We explore different combinations of feature functions for sequence labeling.•We validate our framework on segmented and unsegmented action datasets, respectively.
Loading