Abstract: At this blooming age of social media and mobile platform, mass consumers are migrating from horizontal video to vertical contents delivered on hand-held devices. Accordingly, revitalizing the exposure of horizontal video becomes vital and urgent, which is hereby tackled by our automated horizontal-to-vertical (abbreviated as <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">H2V</b> ) video conversion framework. Essentially, the <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">H2V</b> framework performs subject-preserving video cropping instantiated in the proposed Rank-SS module. Rank-SS incorporates object detection to discover the candidate subjects, from which we select the primary subject-to-preserve leveraging location, appearance, and salient cues in a convolutional neural network. In addition to converting horizontal videos vertically by cropping around the selected subject, automatic shot detection and multi-object tracking are integrated into the <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">H2V</b> framework to accommodate long and complex videos. To develop <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">H2V</b> systems, we collect an <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">H2V-142 K</b> dataset containing 125 videos (132 K frames) and 9500 cover images annotated with primary subject bounding boxes. On <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">H2V-142 K</b> and public object detection datasets, our method demonstrates promising results on the subject selection comparing to the related solutions. Furthermore, our <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">H2V</b> framework is industrially deployed hosting millions of daily active users and exhibits favorable <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">H2V</b> conversion performance. By making this dataset as well as our approach publicly available, we wish to pave the way for more horizontal-to-vertical video conversion research. Our collected <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">H2V-142 K</b> dataset is available at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://tianchi.aliyun.com/dataset/dataDetail?dataId=93339</uri> .
0 Replies
Loading