# TIEM: Enhancing Explanation of Video Prediction via Temporal Dynamics-Focused Dual Perturbation
Python implementation of the time importance score-aware extremal perturbation masks for video interpretation (TIEM).
This code reproduces the results of TIEM in the paper.
In particular, it reproduces Figs. 5, 6, and 9 of the paper, but the other results can also be observed with slight modifications.

## Requirements
This implementation is based on Python 3.8.8 (requirements.txt is included).
Due to the upload size limitations, we cannot include the UCF dataset [1,[url](https://www.crcv.ucf.edu/data/UCF101.php)] and the trained models.
For the results except for Fig. 5, you should download the dataset and train the model before applying the methods.
The data preprocessing and model training functions are also provided. See below.

## Usage

```python
python main.py --figure [figure number] --model_arch [model_type] --ex_type [video_type] --train_model --data_preprocessing
```
Required argument:

*  --figure: Experiments for the figure you want to execute {5, 6, 9}

Optional arguments:

*  --model_arch: Type of black-box model to be interpreted {R2p1d,R50LSTM}
*  --ex_type: Sample video to be interpreted {breaststroke, frontcrawl, floorgymnetics}
*  --data_preprocessing: Preprocess of UCF dataset [1,[url](https://www.crcv.ucf.edu/data/UCF101.php)]
    * Before using this function, you need to place the downloaded dataset files into the directory of 'dataset/UCF101/' within the project before preprocess.
*  --train_model: Proceeding with training the Black-box model {True,False}
   

Example

```python
#Figure 5 in paper
python main.py --figure 5
```
Before using black-box model, user need to train model
```python
#Generate data for train
python main.py --data_preprocessing
#Train R2p1d model
python main.py --model_arch R2p1d --train_model
```

```python
#Figure 6 in paper (if you don't have trained model)
python main.py --figure 6 --model_arch R2p1d --ex_type breaststroke
#and
python main.py --figure 6 --model_arch R2p1d --ex_type frontcrawl

#Figure 9 in paper
python main.py --figure 9 --model_arch R50LSTM --ex_type floorgymnetics

```

### Applying the STEP/EP-3D Method

You can also apply the **STEP** and **EP-3D** methods for generating visual explanations in your project.
Please download the code from the following repository [2,[url](https://github.com/shinkyo0513/Towards-Visually-Explaining-Video-Understanding-Networks-With-Perturbation)].

```python
mask = video_perturbation(
    model, input, target, method="STEP", areas=[0.1], 
    sigma=13, max_iter=2000, variant="preserve",
    num_devices=1, print_iter=200, perturb_type="blur"
)[0]

mask = video_perturbation(
    model, input, target, method="3d_ep", areas=[0.1], 
    sigma=13, max_iter=2000, variant="preserve",
    num_devices=1, print_iter=200, perturb_type="blur"
)[0]
```

### References
1. K Soomro. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402, 2012
1. Zhenqiang Li, Weimin Wang, Zuoyue Li, Yifei Huang, and Yoichi Sato. Towards visually explaining video understanding networks with perturbation. In Proceedings of the IEEE/CVF winter conference on applications of computer vision (WACV), pp. 1120–1129, 2021.