Horizontal-to-Vertical Video Conversion

Tun Zhu, Daoxin Zhang, Yao Hu, Tianran Wang, Xiaolong Jiang, Jianke Zhu, Jiawei Li

2022 (modified: 12 Nov 2022)IEEE Trans. Multim. 2022Readers: Everyone

Abstract: At this blooming age of social media and mobile platform, mass consumers are migrating from horizontal video to vertical contents delivered on hand-held devices. Accordingly, revitalizing the exposure of horizontal video becomes vital and urgent, which is hereby tackled by our automated horizontal-to-vertical (abbreviated as <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">H2V ) video conversion framework. Essentially, the <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">H2V framework performs subject-preserving video cropping instantiated in the proposed Rank-SS module. Rank-SS incorporates object detection to discover the candidate subjects, from which we select the primary subject-to-preserve leveraging location, appearance, and salient cues in a convolutional neural network. In addition to converting horizontal videos vertically by cropping around the selected subject, automatic shot detection and multi-object tracking are integrated into the <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">H2V framework to accommodate long and complex videos. To develop <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">H2V systems, we collect an <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">H2V-142 K dataset containing 125 videos (132 K frames) and 9500 cover images annotated with primary subject bounding boxes. On <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">H2V-142 K and public object detection datasets, our method demonstrates promising results on the subject selection comparing to the related solutions. Furthermore, our <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">H2V framework is industrially deployed hosting millions of daily active users and exhibits favorable <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">H2V conversion performance. By making this dataset as well as our approach publicly available, we wish to pave the way for more horizontal-to-vertical video conversion research. Our collected <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">H2V-142 K dataset is available at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://tianchi.aliyun.com/dataset/dataDetail?dataId=93339</uri> .

0 Replies