Supplementary Materials for "Learning Object-centric Latent Dynamics for Reinforcement Learning from Pixels"

This folder contains supplementary materials to support our submission. The materials are organized into two main
categories: behavior videos and open-loop predictions.

Folder Structure:

supplementary_material/
├── behavior_videos/
├── open_loop_predictions/
└── README.txt (this file)

1. Behavior Videos

The "behavior_videos" folder contains videos demonstrating the learned behavior of our model across various environments.
Each environment has its own subfolder containing multiple video files. The video filenames follow this format:

[experiment_name]_[seed]_[reward]_[success].mp4

Where:
- experiment_name: Name of the specific experiment or environment (and, if applicable, a consecutive number for the
  training run, if several were carried out).
- seed: Random seed used for the episode
- reward: Total reward (rounded to an integer) obtained in the episode
- success: Flag indicating whether the task was successfully completed (True/False or 1/0). This does not apply to the
  DM control experiments, as no success was defined here.

Environments included:
- DeepMind Control Suite tasks (dm_cartpole_balance, dm_finger_spin)
- Meta-World tasks (button_press, hammer)
- Custom robotics tasks (reach, push, pick-and-place) with various configurations

2. Open-Loop Predictions

The "open_loop_predictions" folder contains videos showcasing our model's open-loop prediction capabilities.
These videos demonstrate how well our model can predict future states/observations. The folder structure mirrors that of
the "behavior_videos" folder, with each training run having its own subfolder.

Each video in this section shows:
1. Ground truth frames (first video from left with green border)
2. Reconstructed frames (second video from left), where as long as the frames have a green border, the SAVi reconstruction
   is displayed. As soon as the border turns red, frames generated by the predictor are shown.
   The slots associated with the SAVi reconstructions serve as seed slots for the predictor.
3. Slot segmentation frames (third video from the left); the same logic for the green/red border applies as for the
   reconstructed frames.
4. Individual object/slot representations; again the same logic for the green/red border applies as for the
   reconstructed frames.

Video filenames in this section are simply numbered according to the random seed used for the episode. In contrast to the
behavior videos, actions are sampled from the action model rather than selecting the mean action.
