Video Generation Using 3D Convolutional Neural Network

Published: 01 Jan 2016, Last Modified: 24 Oct 2024ACM Multimedia 2016EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Recently, content generation using neural network has been widely studied. Motivated by this recent progress, we studied the generation of videos using only a label as input. In our method, we iteratively minimize two objective functions at the same time : an objective function to evaluate how close the video is to the target class and another to evaluate how natural-appearing the video is. Our proposed method uses the cross-entropy error between the target label and the output of 3D convolutional neural network (C3D) as the objective function for evaluating how close the video is to the target class and uses the Euclidean distance between the input video and the video decoded from our temporal convolutional auto-encoder ("tempCAE") as the objective function for evaluating how natural-appearing the video is. We conducted an experiment evaluating the generated videos using a crowdsourcing service and confirmed the utility of our method.
Loading