Abstract: n this paper, we consider the following activity recognition task: given a video, infer the set of activities being performed in the video along with an assignment of activities to each frame in the video. Although this task can be solved accurately using existing deep learning systems, their use is problematic in interactive settings. In particular, deep learning models are black boxes: it is difficult to understand how and why the system assigned a particular activity to a frame. This reduces the users’ trust in the system, especially in the case of end-users who need to use the system on a regular basis. We address this problem by feeding the output of deep learning to a tractable interpretable probabilistic graphical model and then performing joint learning over the two. The key benefit of our proposed approach is that deep learning helps achieve high accuracy while the interpretable probabilistic model makes the system explainable. We demonstrate the power of our approach using a visual interface to provide explanations of model outputs for queries about videos.
0 Replies
Loading