Abstract: Affective analysis of movies heavily depends on the causal understanding of the story with long-time dependencies. Limited by the existing sequence models such as LSTM, Transformer, etc., current works generally split the movies into dependent clips and predict the affective impacts (Valence/Arousal) independently, ignoring the long historical impacts across the clips. In this paper, we introduce a novel Reinforcement learning based Memory Net (RMN) for this task, which facilitates the prediction of the current clip to rely on the possible related historical clips of this movie. Compared with LSTM, the proposed method solves the long-time dependencies from two aspects. First, we introduce a readable and writable memory bank to store useful historical information, which solves the problem of the restricted memory unit for LSTM. However, the traditional parameters' update scheme of the memory network, when applied for long sequence prediction, still needs to store the gradients for long sequences. It suffers from gradient vanishing and exploding, similar to the issues of backpropagation through time (BPTT). For this problem, we introduce a reinforcement learning framework in the memory write operation. The memory updating scheme of the framework is optimized via one-step temporal difference, modeling the long-time dependencies using both the policy and value networks. Experiments on the LIRIS-ACCEDE dataset show that our method achieves significant performance gains over the existing methods. Besides, we also apply our method to other long sequence prediction tasks, such as music emotion recognition and video summarization, and also achieve state-of-the-art on those tasks.
0 Replies
Loading