# Multimodal Subtask Graph Generation from Instructional Videos

- In the ablation study, we measure the performance of completion prediction based on manually annotated labels. Those labels are stored in '**ProceL_completion_validation_labels.pkl.npy**' file. The file format of this file is described below:

  ```
  import numpy as np
  data_by_task = np.load('ProceL_completion_validation_labels.pkl.npy', allow_pickle=True).item()
    # data_by_task: dict
    # data = data_by_task[task_name] : dict    (<- key is the name of a task)
    # - data['num_subtask']: int
    # - data['subtask_labels']: list[str]
    # - trajectories = data['trajectories']: list
    # -- trajectories = [trajectory1, trajectory2...]
    # -- trajectory = trajectories[i]: dict
    # --- trajectory['subtask_indices'] = np.array([0, 1, 3, 3, 0, 1] (int) [0-indexed]
    # --- trajectory['start_flags'] =     np.array([T, T, T, F, F, F] (bool) [T means start, F means end]
    # --- trajectory['30fps_frame_numbers'] =     np.array([6044, 6044, 6044, 6176, 6512, 6670] (int)  [frame number of each point, after converting the videos into 30fps]
    # --- trajectory['name']: str (filename of a video that matched with this sequence)
  ```

- We use the manually-drawn subtask graph for the tasks for evaluating the performance of graph generation, and those ground-truth graphs are in '**gt_graph**.' We also include our generated graphs in '**our_graph**.'

- For both [ProceL](https://www.khoury.northeastern.edu/home/eelhami/procel.htm) and [CrossTask](https://github.com/DmZhukov/CrossTask) datasets, we report the train and test split used in the next state prediction in the '**data_split**' folder.

- We also attach the codebase for subtask state prediction in the '**state_prediction**' folder.
