# Generalizing Denoising to Non-Equilibrium Structures Improves Equivariant Force Fields


We provide the code for training [EquiformerV2](https://arxiv.org/abs/2306.12059) with DeNS on OC20 and OC22 datasets here.


<p align="center">
	<img src="fig/denoising_structures_overview.png" alt="photo not available" width="98%" height="98%">
</p>

<p align="center">
	<img src="fig/dens_training_process.png" alt="photo not available" width="98%" height="98%">
</p>

<p align="center">
	<img src="fig/dens_oc20_all+md.png" alt="photo not available" width="98%" height="98%">
</p>

<p align="center">
	<img src="fig/dens_oc22.png" alt="photo not available" width="98%" height="98%">
</p>

<p align="center">
	<img src="fig/dens_md17.png" alt="photo not available" width="98%" height="98%">
</p>


## Content ##
0. [Environment Setup](#environment-setup)
0. [File Structure](#file-structure)
0. [Training](#training)



## Environment Setup ##


### Environment 

See [here](docs/env_setup.md) for setting up the environment.


### OC20

Please first set up the environment and file structures (placing this repository under `ocp` and rename it to `experimental`) following the above [Environment](#environment) section.

The OC20 S2EF dataset can be downloaded by following instructions in their [GitHub repository](https://github.com/Open-Catalyst-Project/ocp/blob/5a7738f9aa80b1a9a7e0ca15e33938b4d2557edd/DATASET.md#download-and-preprocess-the-dataset).

For example, we can download the OC20 S2EF-2M dataset by running:
```
    cd ocp
    python scripts/download_data.py --task s2ef --split "2M" --num-workers 8 --ref-energy
```
We also need to download the `"val_id"` data split to run training.

After downloading, the datasets should be under `ocp/data`.

To train on different splits like All and All+MD, we can follow the same link above to download the datasets.


### OC22

Please first set up the environment and file structures (placing this repository under `ocp` and rename it to `experimental`) following the above [Environment](#environment) section.

Similar to OC20, the OC22 dataset can be downloaded by following instructions in their [GitHub repository](https://github.com/FAIR-Chem/fairchem/blob/5a7738f9aa80b1a9a7e0ca15e33938b4d2557edd/DATASET.md#open-catalyst-2022-oc22).



## File Structure ##

1. [`configs`](configs) contains config files for training with DeNS on different datasets.
2. [`datasets`](datasets) contains LMDB dataset class that can distinguish whether structures in OC20 come from All split or MD split.
3. [`model`](model) contains EquiformerV2 and eSCN models capable of training with DeNS.
4. [`scripts`](scripts) contains the scripts for launching training based on config files.
5. [`trainers`](trainers) contains the code for training models for S2EF and with DeNS.


## Training ##

### OC20

1. Modify the paths to datasets before launching training. For example, we need to modify the path to the training set as [here](configs/oc20/2M/equiformer_v2/equiformer_dens_v2_N%4012_L%406_M%402_lr%402e-4_epochs%4012_std%400.1_gpus%4016.yml#L7) and the validation set as [here](configs/oc20/2M/equiformer_v2/equiformer_dens_v2_N%4012_L%406_M%402_lr%402e-4_epochs%4012_std%400.1_gpus%4016.yml#L20) before training EquiformerV2 with DeNS on OC20 S2EF-2M dataset for 12 epochs.

2. We train EquiformerV2 with DeNS on the **OC20 S2EF-2M dataset** for **12 epochs** by running:
    ```bash
        cd ocp/
        sh experimental/scripts/train/oc20/s2ef/equiformer_v2/equiformer_dens_v2_N@12_L@6_M@2_epochs@12_splits@2M_g@multi-nodes.sh
    ```
    Note that following the above [Environment](#environment) section, we will run the script under `ocp`.
    This script will use 2 nodes with 8 GPUs on each node.
    
    We can also run training on 8 GPUs on 1 node:
    ```bash
        cd ocp/
        sh experimental/scripts/train/oc20/s2ef/equiformer_v2/equiformer_dens_v2_N@12_L@6_M@2_epochs@12_splits@2M_g@8.sh
    ```
    Note that this is to show that we can train on a single node and the results are not the same as training on 16 GPUs.

    Similarly, we train EquiformerV2 with DeNS on the **OC20 S2EF-2M dataset** for **30 epochs** by running:
    ```bash
        cd ocp/
        sh experimental/scripts/train/oc20/s2ef/equiformer_v2/equiformer_dens_v2_N@12_L@6_M@2_epochs@30_splits@2M_g@multi-nodes.sh
    ```
    This script will use 4 nodes with 8 GPUs on each node.

3. We train EquiformerV2 with DeNS on the **OC20 S2EF-All+MD dataset** by running:
    ```bash
        cd ocp/
        sh experimental/scripts/train/oc20/s2ef/equiformer_v2/equiformer_dens_v2_N@20_L@6_M@3_splits@all-md_g@multi-nodes.sh
    ```
    This script will use 16 nodes with 8 GPUs on each node.


### OC22

1. Modify the paths to datasets before launching training. Specifically, we need to modify the path to the training set as [here](configs/oc22/equiformer_v2/equiformer_v2_dens_N%4018_L%406_M%402_e%404_f%40100_std%400.15.yml#L7) and the validation set as [here](configs/oc22/equiformer_v2/equiformer_v2_dens_N%4018_L%406_M%402_e%404_f%40100_std%400.15.yml#L22). 

    In addition, we need to download the linear reference file from [here](https://github.com/FAIR-Chem/fairchem/tree/be0f727a515582b01e6c51672a08f5b693f015e9/configs/oc22/linref) and then add the path to the linear reference file as [here](configs/oc22/equiformer_v2/equiformer_v2_dens_N%4018_L%406_M%402_e%404_f%40100_std%400.15.yml#L19) and [here](configs/oc22/equiformer_v2/equiformer_v2_dens_N%4018_L%406_M%402_e%404_f%40100_std%400.15.yml#L24). 
    
    Finally, we download the OC20 reference information file from [here](https://github.com/FAIR-Chem/fairchem/blob/be0f727a515582b01e6c51672a08f5b693f015e9/DATASET.md#oc20-reference-information) and add the path to that file as [here](configs/oc22/equiformer_v2/equiformer_v2_dens_N%4018_L%406_M%402_e%404_f%40100_std%400.15.yml#L20) and [here](configs/oc22/equiformer_v2/equiformer_v2_dens_N%4018_L%406_M%402_e%404_f%40100_std%400.15.yml#L25).

2. We train EquiformerV2 with DeNS on OC22 dataset by running:
    ```bash
        cd ocp/
        sh experimental/scripts/train/oc22/s2ef/equiformer_v2/equiformer_dens_v2_N@18_L@6_M@2_epochs@6_g@multi-nodes.sh
    ```
    This script will use 4 nodes with 8 GPUs on each node.