# Dataset Card for InverseBench

- [Dataset Card for InverseBench](#dataset-card-for-inversebench)
  - [Details of Data in Full Waveform Inversion](#details-of-data-in-full-waveform-inversion)
  - [Details of Data in Inverse Scattering](#details-of-data-in-inverse-scattering)
- [Contributions](#contributions)



## Details of Data in Blackhole Imaging

- Training dataset is from GRMHD (50k).
- Synthetic test dataset from pretrained diffusion model.

| Data split | Number of entries | Statistics (mean/std/min/max) | Unit | Download LInk                                                |
| ---------- | ----------------- | ----------------------------- | ---- | ------------------------------------------------------------ |
| Train      | 50,000            | -/-/-/-                       | /    | Private                                                      |
| Test       | 100               | -/-/-/-                       | /    | [Download](https://inversebench.s3.us-east-2.amazonaws.com/bh-5m.pt) |
|            |                   |                               |      |                                                              |


## Details of Data in Full Waveform Inversion
- Adapted from the velocity map part of CurveFault dataset in OpenFWI by Deng, Chengyuan, et al. 2022 [1]. 
- License: Creative Commons BY-NC-SA 4.0
- Data adaptation by us: we resize the original velocity map from resolution 70x70 to 128x128 with bilinear interpolation and anti-aliasing. 

| Data split | Number of entries | Statistics (mean/std/min/max) | Unit | Download LInk |
|------------|-------------------|-------------------------------|------| ------|
| Train      | 50,000            | 3.04/0.88/1.50/4.50           | km/s | [Download](https://inversebench.s3.us-east-2.amazonaws.com/fwi-train.zip) |
| Test       | 4,000             | 3.00/0.89/1.50/4.50           | km/s | [Download](https://inversebench.s3.us-east-2.amazonaws.com/fwi-test.zip) |
|            |                   |                               |      ||

## Details of Data in Inverse Scattering
- Generated using the online simulator [CytoPacq](https://cbia.fi.muni.cz/simulator/index.php)[2]. 
- License: Creative Commons BY-NC-SA 4.0
- Configuration used to generate data: 
  - VOI: `42x42x12`
  - Cover the whole CCD data: unchecked. 
  - Subpixel precision 1x. 
  - Type of phantom: HL60 nucleus (static)
  - Position: random
  - Amount: uniformly random number from 1 to 6.
  - Optical System: default value
  - Acquisition Device: default value
- Data post-processing: 
  - We crop each image to shape `128x128`.
  - Samples in test and val are selected so that their cosine similarity is less than 0.6 w.r.t the most similar sample in training set.


| Data Split | Number of Entries | Statistics (min/max) | Unit | Download Link                                                |
| ---------- | ----------------- | -------------------- | ---- | ------------------------------------------------------------ |
| Train      | 10,000            | 0.0 / 1.0            | F/m  | [Download](https://inversebench.s3.us-east-2.amazonaws.com/inv-scatter-train.zip) |
| Test       | 100               | 0.0 / 1.0            | F/m  | [Download](https://inversebench.s3.us-east-2.amazonaws.com/inv-scatter-test.zip) |
| Validation | 10                | 0.0 / 1.0            | F/m  | [Download](https://inversebench.s3.us-east-2.amazonaws.com/inv-scatter-val.zip) |


## Acknowledgement
[1]: Deng, Chengyuan, et al. "OpenFWI: Large-scale multi-structural benchmark datasets for full waveform inversion." Advances in Neural Information Processing Systems 35 (2022): 6007-6020.

[2]: Wiesner D, Svoboda D, Maška M, Kozubek M. CytoPacq: A web-interface for simulating multi-dimensional cell imaging. Bioinformatics, Oxford University Press, 2019. ISSN 1367-4803. 2019. doi:10.1093/bioinformatics/btz417.