
# Looking Into the Water by Unsupervised Learning of the Surface Gradient

This repository contains the code, datasets, and scripts for reproducing the experiments presented in our under review paper:

**Looking Into the Water by Unsupervised Learning of the Surface Gradient**  
_(Under review, NeurIPS 2025)_

## Overview

We address the problem of looking into the water from the air, where we seek to remove image distortions caused by refractions at the water surface. Our approach is based on modeling the different water surface structures at various points in time, assuming the underlying image is constant. To this end, we propose a model that consists of two neural-field networks. The first network predicts the height of the water surface at each spatial position and time, and the second network predicts the image color at each position. Using both networks, we reconstruct the observed sequence of images and can therefore use unsupervised training. We show that using implicit neural representations with periodic activation functions (SIREN) leads to effective modeling of the surface height spatio-temporal signal and its derivative, as required for image reconstruction. Using both simulated and real data we show that our method outperforms the latest unsupervised image restoration approach. In addition, it provides an estimate of the water surface

Key contributions:
- A two-network neural field model: one predicts surface height over time, and the other predicts the underlying image.
- A differentiable refraction model based on Snell's law to relate surface gradients to pixel distortions.
- Superior performance over existing unsupervised and supervised methods on both real and simulated datasets.

![Teaser Image](./figures/teaser.png)  
*Distorted input → Restored scene + Estimated surface*


## Method Overview
![Architecture Image](./figures/arch.png)
**Architecture Summary**

*Surface Estimation Network*

- **Inputs**: 2D spatial coordinates `x_reg` and time `t`  
- **Output**: Surface height at each frame  
- The temporal gradient and its average over time are computed from the predicted surface heights.  
- These gradients are used to compute spatial distortions using Snell's law (see Equation in the paper).

*Image Reconstruction Network*

- **Inputs**: Distorted coordinates derived from the surface gradient  
- **Outputs**:  
  - The reconstructed clean image `I_phi(x_reg)`  
  - A set of re-distorted images `I^t_{theta, phi}` that simulate the distortions caused by the predicted surface

The network is trained by minimizing the loss between the predicted distorted images `I^t_{theta, phi}` and the observed distorted inputs `I^t`.


---

## Installation

Dependencies:
- Python 3.8+
- PyTorch ≥ 1.10
- NumPy
- OpenCV
- Matplotlib
- Pandas
- SciPy
- scikit-learn

To install:
```bash
conda env create -f environment.yml
```
Activate the environment:
```bash
conda activate unsugrd_refrem
```
Download datasets following the links in this README and place them in the `data/` directory.

---

## Usage

Note, [wandb](https://wandb.ai/) is used for logging. If you don't want to use it you will need to slightly modify the code in `unsugrd_refrem.py` to remove the wandb logging.

All experiments can be run using:
```bash
python unsugrd_refrem.py \
  --environment "local" \
  --exp_name "test" \
  --pname james_real1_bricks \
  --w0_first_imgen 45.0 \
  --w0_first_grid 15.0 \
  --width_layers_grid "[128,128]" \
  --num_iter_initialize 700 \
  --num_iter_optim 1800 \
  --scale_factor 0.5 \
  --lr_init 0.001 \
  --lr_optim 0.0001 \
  --batch_size 10 \
  --start_f 0
```
This command runs the model on the **Bricks** sequence from the **Real1** dataset. The parameters can be adjusted as needed.

The default values should be the ones in the paper, the command to use the default parameters:
```bash
python unsugrd_refrem.py --environment "local" --exp_name "test" --pname james_real1_bricks
```

Options include:
- `--environment` (default: "Cluster"): Specify the environment (e.g., "local" or "remote").
- `--exp_name` (default: "No_exp_name"): Name of the experiment.
- `--pname` (default: None): Run only on the specified sequence name.
- `--batch_size` (default: 10): Batch size for training.
- `--scale_factor` (default: 0.5): Scaling factor for the input.
- `--siren`: Use the SIREN architecture (default: True).
- `--dim_in_imgen` (default: 256): Input dimension for the image generation network.
- `--w0_first_imgen` (default: 30.0): Initial frequency for the first layer of the image generation network.
- `--num_layers_imgen` (default: 3): Number of layers in the image generation network.
- `--width_layers_imgen` (default: [256, 256, 256]): Width of each layer in the image generation network.
- `--w0_first_grid` (default: 30.0): Initial frequency for the first layer of the grid network.
- `--num_layers_grid` (default: 2): Number of layers in the grid network.
- `--width_layers_grid` (default: [128, 64]): Width of each layer in the grid network.
- `--num_iter_initialize` (default: 1000): Number of initialization iterations.
- `--lr_init` (default: 1e-3): Learning rate for initialization.
- `--num_iter_optim` (default: 1700): Number of optimization iterations.
- `--lr_optim` (default: 1e-3): Learning rate for optimization.
- `--bandwidth_img` (default: 8): Frequency bandwidth for the turbulence field.
- `--start_f` (default: 0): Index of the first image in the batch.
- `--debug`: Print debug images (default: False).

Reconstruction results, surface maps, and evaluation metrics will be saved in `./results/`.

---

## Datasets

We evaluate on the following datasets (follow links to download):

- **Real1** from James *et al.* [CompressiveFlows (ICCV2019)](https://github.com/jeringeo/CompressiveFlows) – challenging due to motion blur and high distortions
- **TianSet** from Tian and Narasimhan [Seeing through Water: Image Restoration using Model-based Tracking (ICCV 2009)](https://www.cs.cmu.edu/~ILIM/projects/IM/water/research_water.html)
- **Synthetic** wave dataset generated using methods from Thapa *et al.* [Dynamic Fluid Surface Reconstruction Using Deep Neural Network (CVPR 2020)](https://github.com/SimronThapa/FSRN-CVPR2020)

Download links and preprocessing instructions are in `datasets/README.md`.

---

## Reproducibility

- Code and configurations are included to reproduce all results in the paper.
- Random seeds are fixed.
- Memory usage: ~15GB on a GeForce RTX 4090.
- Runtime: ~5 minutes per sequence (10 frames).

Supplementary includes:
- Evaluation scripts with PSNR, SSIM, LPIPS
- Visualization scripts for surface reconstructions

---


## 🧪 Quantitative Results

We evaluate our method on two real-world datasets: **Real1** from James *et al.* [9] and **TianSet** from Tian and Narasimhan [21]. Our method is compared against:

- **NDIR** [Li et al., CVPR 2021] – unsupervised baseline
- **Li et al.** [WACV 2018] – supervised single-image method

Our approach achieves the highest **average PSNR, SSIM, and LPIPS** across both datasets.

### 📊 Results on Real1 Dataset

| Sequence  | PSNR ↑           | SSIM ↑         | LPIPS ↓        |
|-----------|------------------|----------------|----------------|
| Bricks    | **21.34 ± 0.56** | **0.59 ± 0.05**| **0.16 ± 0.02**|
| Cartoon   | **22.37 ± 0.70** | **0.79 ± 0.03**| **0.12 ± 0.01**|
| Checker   | **14.27 ± 1.20** | **0.58 ± 0.09**| **0.10 ± 0.02**|
| Dices     | **19.15 ± 0.35** | **0.57 ± 0.04**| **0.09 ± 0.01**|
| Elephant  | **15.95 ± 0.44** | **0.33 ± 0.05**| **0.17 ± 0.01**|
| Eye       | **21.42 ± 0.23** | **0.83 ± 0.02**| **0.10 ± 0.01**|
| Math      | 23.98 ± 0.74     | 0.60 ± 0.07    | 0.11 ± 0.06    |

**Average (Real1):**  
**PSNR:** 19.78 ± 0.67  
**SSIM:** 0.613 ± 0.055  
**LPIPS:** 0.121 ± 0.026

---

### 📊 Results on TianSet Dataset

| Sequence | PSNR ↑           | SSIM ↑         | LPIPS ↓       |
|----------|------------------|----------------|---------------|
| Small    | **19.90 ± 0.48** | **0.364 ± 0.064** | **0.22 ± 0.013** |
| Middle   | **17.10 ± 1.07** | **0.432 ± 0.082** | **0.126 ± 0.017** |

**Average (TianSet):**  
**PSNR:** 18.50 ± 0.83  
**SSIM:** 0.398 ± 0.074  
**LPIPS:** 0.174 ± 0.015


---


## License

This work is released for academic research use only. [CC BY 4.0 license applies](https://creativecommons.org/licenses/by/4.0/).

---

## Acknowledgments

This work builds upon [NDIR (CVPR 2021)](https://github.com/NDIRproject), [SIREN (NeurIPS 2020)](https://vsitzmann.github.io/siren/).
This work also uses datasets from prior works:
- [CompressiveFlows (ICCV2019)](https://github.com/jeringeo/CompressiveFlows)
- [Seeing through Water: Image Restoration using Model-based Tracking (ICCV 2009)](https://www.cs.cmu.edu/~ILIM/projects/IM/water/research_water.html).

We thank the authors for their contributions.