# Data generation

First add current directiory to `PYTHONPATH`:
```bash
export PYTHONPATH=`pwd`
```

## Navier Stokes 2D

```bash
export seed=42;

python scripts/generate_data.py base=pdedatagen/configs/navierstokes2dsmoke.yaml \
    experiment=smoke mode=train samples=256 seed=$seed pdeconfig.sample_rate=4 \
    dirname=/mnt/data/navierstokes;

python scripts/generate_data.py base=pdedatagen/configs/navierstokes2dsmoke.yaml \
    experiment=smoke mode=valid samples=32 seed=$seed pdeconfig.sample_rate=4 \
    dirname=/mnt/data/navierstokes;

python scripts/generate_data.py base=pdedatagen/configs/navierstokes2dsmoke.yaml \
    experiment=smoke mode=test samples=32 seed=$seed pdeconfig.sample_rate=4 \
    dirname=/mnt/data/navierstokes;
```


### Data normalization

The data was reasonably bounded that we didn't need any normalization.

## Shallow water 2D

```bash
export seed=42;

python scripts/generate_data.py base=pdedatagen/configs/shallowwater.yaml \
    experiment=shallowwater mode=train samples=256 seed=$seed \
    dirname=/mnt/data/shallowwater;

python scripts/generate_data.py base=pdedatagen/configs/shallowwater.yaml \
    experiment=shallowwater mode=valid samples=32 seed=$seed \
    dirname=/mnt/data/shallowwater;

python scripts/generate_data.py base=pdedatagen/configs/shallowwater.yaml \
    experiment=shallowwater mode=test samples=32 seed=$seed \
    dirname=/mnt/data/shallowwater;
```

### Convert to [`zarr`](https://zarr.dev/)
We found that data loading was a lot more performant with `zarr` format rather than original [`NetCDF`](https://www.unidata.ucar.edu/software/netcdf/) format, especially with cloud storage. You can convert after data generation via:

```bash
for mode in train valid test; do
    python scripts/convertnc2zarr.py "/mnt/data/shallowwater/$mode";
done
```

### Data normalization

```bash
python scripts/compute_normalization.py \
    --dataset shallowwater /mnt/data/shallowwater
```

## Maxwell 3D

```bash
export seed=42

python scripts/generate_data.py base=pdedatagen/configs/maxwell3d.yaml \
    experiment=maxwell mode=train samples=256 seed=$seed dirname=/mnt/data/maxwell3d;

python scripts/generate_data.py base=pdedatagen/configs/maxwell3d.yaml \
    experiment=maxwell mode=valid samples=32 seed=$seed dirname=/mnt/data/maxwell3d;

python scripts/generate_data.py base=pdedatagen/configs/maxwell3d.yaml \
    experiment=maxwell mode=test samples=32 seed=$seed dirname=/mnt/data/maxwell3d;
```

### Data normalization

```bash
python scripts/compute_normalization.py \
    --dataset maxwell /mnt/data/maxwell3d
```


## Data download

First make sure you have [`azcopy`](https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-v10) installed.

On Linux you can do:

```bash
wget https://aka.ms/downloadazcopy-v10-linux
tar -xvf downloadazcopy-v10-linux
# move to somewhere on your PATH
mv ./azcopy_linux_amd64_*/azcopy $HOME/.local/bin
```

## Navier Stokes - 2D

Generated using [Φ~Flow~](https://github.com/tum-pbs/PhiFlow/).

```bash
azcopy copy "https://pdearenarelease.blob.core.windows.net/datasets/NavierStokes2D_smoke" \
            "/mnt/data/" --recursive
```


## Shallow water - 2D

Generated using [SpeedyWeather.jl](https://github.com/milankl/SpeedyWeather.jl).

```bash
azcopy copy "https://pdearenarelease.blob.core.windows.net/datasets/ShallowWater2D" \
            "/mnt/data/" --recursive
```

## Maxwell - 3D

Generated using [Python 3D FDTD Simulator](https://github.com/flaport/fdtd).

```bash
azcopy copy "https://pdearenarelease.blob.core.windows.net/datasets/Maxwell2D" \
            "/mnt/data/" --recursive
```
