# IntPhys2


## Evaluation code for prediction based models

We provide the code to run prediction based evaluations in the `prediction_evals` subfolder

### Running the code

For algorithmic clarity and reproducibility, we provide a version of our code which can be used to extract surprise metrics from models. It is compatible with V-JEPA models and VideoMAEv2. The code is based on [github.com/facebookresearch/jepa-intuitive-physics](https://github.com/facebookresearch/jepa-intuitive-physics).

For requirements to run the code, see `requirements.txt` .

We provide a singular evaluations:
- `intphys2` This evaluation will run through the dataset and extract surprises for all models. These surprises can then be used to compute accuracy.

To run the evaluation code, the file `evaluation_code/evals/intphys2/utils.py` 

As the code is meant to be reusable on various clusters where data doesn't share a common path. You need to specify what is `CLUSTER` as well as what the paths of the datasets are.
If you intend on only using a singular cluster, the `get_cluster()` function can simply be replaced by:
```python
@lru_cache()
def get_cluster() -> str:
    return CLUSTER
```
Then, just update the dataset paths in `DATASET_PATHS_BY_CLUSTER`.

From the `evaluation_code` folder, evaluations can either be run locally, e.g:
```bash
python -m evals.main --devices cuda:0 cuda:1 cuda:2 cuda:3 cuda:4 cuda:5 --fname evals/intphys2/configs/vjepa_rope.yaml
```

or through submitit, e.g.:

```bash
python -m evals.main_distributed --fname evals/intphys2/configs/vjepa_rope.yaml --folder ./logs --partition PARTITION 
```

### Configurations

We provide default configurations in the evaluations folder that should be adapted depending on the model that you are using.

The *model_kwargs* section contains information to load the pretrained model. Most important are *checkpoint* which is the model path, and *module_name* which is the wrapper to use.

The parameters *tasks_per_node* and *nodes* are only used when using submitit to control the number of GPUs used. Due to the computational cost of COSMOS, we recomment running on 8 nodes with 8 task per nodes each. Other models can be run on 1 node.