Abstract: Highlights•Proposed a video engagement model mirroring human observational techniques.•Improved interpretability and application scope over traditional recognition methods.•Developed a regional spatiotemporal modeling method for better engagement recognition.•Introduced a bipartite matching based set prediction method for behavior capture.•Achieved SOTA results on benchmark datasets with our new approach.
Loading