# 🌀 OpenFake: An Open Dataset and Platform Toward Large-Scale Deepfake Detection

**OpenFake** is a continually updated benchmark for detecting AI-generated images. This repository contains:

- The [OpenFake dataset](https://huggingface.co/datasets/Anonymous460/OpenFake)
- Multiple detection baselines (CLIP-based, SwinV2, InternVL, DRCT, GenImage, Semi-Truths)
- Scripts to generate synthetic images using Stable Diffusion 3 and Flux

- Weights are in [Anonymous Google Drive](https://drive.google.com/drive/folders/1S-nFqyKJoFS43OBW8KIx8qPiixfL7wNs?usp=share_link)
- In-the-wild small dataset is in [Anonymous Google Drive](https://drive.google.com/drive/folders/125mIkM7hl2s5A7hv1p49esuUZsAMXQAj?usp=share_link)

---

## 📁 Repository Structure

- `datasets/`: Tools and notebooks to build and label datasets (e.g., `get_label_files.ipynb`)
- `baselines/`: Includes SwinV2 (trainable) and other baselines 
  Each has its own README with instructions to reproduce results

---

## Baseline results — Performance comparison on **OpenFake**

**Performance comparison on _OpenFake_ across detectors trained on different datasets.** Finetuned (FT) and LoRA variants are grouped under their respective base generators. Generators marked `(OOD)` are out-of-distribution for all detectors.

| Generator / Metric | **OpenFake (SwinV2)** | GenImage (SwinV2) | S.-Truths (SwinV2) | DRCT (ConvNeXt) | FF++ (EffNet-B4) | CLIP-D-10k+ | DMD (Corvi'23) | InternVL-3 (zero-shot) |
|---|---:|---:|---:|---:|---:|---:|---:|---:|
| **Real (TNR)** | **0.995** | 0.955 | 0.689 | 0.777 | 0.516 | 0.703 | 0.998 | 0.431 |
|  |  |  |  |  |  |  |  |  |
| SD 1.5 | 1.000 | 0.936 | 1.000 | 0.447 | 0.529 | 0.579 | 0.000 | 0.849 |
| SD 2.1 | 1.000 | 0.998 | 0.999 | 0.482 | 0.453 | 0.717 | 0.011 | 0.900 |
| SD XL | 1.000 | 0.956 | 1.000 | 0.426 | 0.507 | 0.438 | 0.001 | 0.814 |
| SD 3.5 | 1.000 | 0.982 | 1.000 | 0.324 | 0.466 | 0.406 | 0.000 | 0.796 |
|  |  |  |  |  |  |  |  |  |
| Flux 1.0 Dev | 1.000 | 0.967 | 0.999 | 0.290 | 0.450 | 0.401 | 0.005 | 0.748 |
| Flux-1.1-Pro | 1.000 | 0.315 | 0.975 | 0.319 | 0.467 | 0.596 | 0.000 | 0.722 |
| Flux-1.0-Schnell | 0.999 | 1.000 | 0.998 | 0.289 | 0.476 | 0.503 | 0.000 | 0.803 |
|  |  |  |  |  |  |  |  |  |
| Midjourney 6 | 1.000 | 0.090 | 0.949 | 0.166 | 0.486 | 0.100 | 0.000 | 0.884 |
| Midjourney 7 | 0.994 | 0.952 | 0.997 | 0.264 | 0.484 | 0.404 | 0.001 | 0.961 |
| DALL·E 3 | 0.995 | 0.238 | 0.927 | 0.461 | 0.543 | 0.394 | 0.000 | 0.983 |
| GPT Image 1 | 0.998 | 0.772 | 0.983 | 0.402 | 0.442 | 0.384 | 0.005 | 0.932 |
| Ideogram 3.0 | 1.000 | 0.993 | 1.000 | 0.254 | 0.481 | 0.414 | 0.001 | 0.844 |
| Imagen 3.0 | 0.999 | 0.962 | 0.998 | 0.237 | 0.461 | 0.286 | 0.005 | 0.784 |
| Imagen 4.0 | 0.996 | 0.948 | 0.996 | 0.228 | 0.459 | 0.359 | 0.003 | 0.796 |
| Grok 2 | 1.000 | 0.142 | 0.963 | 0.383 | 0.463 | 0.303 | 0.000 | 0.805 |
| HiDream-I1 Full | 1.000 | 0.976 | 0.993 | 0.332 | 0.440 | 0.485 | 0.000 | 0.789 |
| Chroma | 0.992 | 0.980 | 0.995 | 0.451 | 0.435 | 0.298 | 0.003 | 0.726 |
|  |  |  |  |  |  |  |  |  |
| *Ideogram 2.0 (OOD)* | 0.993 | 0.997 | 1.000 | 0.234 | 0.482 | 0.777 | 0.000 | 0.865 |
| *Lumina (OOD)* | 1.000 | 1.000 | 1.000 | 0.494 | 0.355 | 0.720 | 0.028 | 0.983 |
| *Frames (OOD)* | 0.968 | 0.816 | 1.000 | 0.368 | 0.392 | 0.920 | 0.000 | 0.912 |
| *Halfmoon (OOD)* | 0.995 | 0.953 | 1.000 | 0.263 | 0.353 | 0.632 | 0.000 | 0.832 |
| *Recraft v2 (OOD)* | 0.972 | 0.699 | 1.000 | 0.379 | 0.443 | 0.248 | 0.004 | 0.929 |
| *Recraft v3 (OOD)* | 0.701 | 0.288 | 0.997 | 0.364 | 0.497 | 0.430 | 0.002 | 0.912 |
|  |  |  |  |  |  |  |  |  |
| **Average TPR** | **0.988** | 0.823 | 0.992 | 0.354 | 0.475 | 0.443 | 0.003 | 0.827 |
|  |  |  |  |  |  |  |  |  |
| **Overall F1** | **0.992** | 0.881 | 0.861 | 0.449 | 0.485 | 0.509 | 0.005 | 0.697 |
| **Overall ROC AUC** | **1.000** | 0.926 | 0.960 | 0.616 | 0.493 | 0.600 | 0.487 | 0.629 |
| **Overall PR AUC** | **1.000** | 0.949 | 0.952 | 0.613 | 0.493 | 0.600 | 0.488 | 0.586 |

**Notes:**  
SwinV2 trained on **OpenFake** consistently outperforms the other detectors on unseen generators while maintaining a high true-negative rate on real images. Many alternative detectors show high false-positive rates (misclassifying real images). Generators marked `(OOD)` were collected from public sources and are out-of-distribution for all trained detectors.


**Generalization of SwinV2 detectors on an _in-the-wild_ social-media set**  
(1,057 real, 163 fake; see Section \[sec:social_media\])

| Metric | **Train _OpenFake_** | **Train _GenImage_** | **Train _Semi-Truths_** |
|---|---:|---:|---:|
| **TNR (real)** | **0.976** | 0.998 | ⚠️ **0.220** |
| **TPR (fake)** | 0.865 | ⚠️ **0.043** | 0.908 |
| **Accuracy** | **0.962** | 0.871 | 0.312 |
| **F1 Score** | **0.857** | 0.081 | 0.261 |
| **ROC–AUC** | **0.978** | 0.557 | 0.634 |

**Notes:**  
- TNR = true-negative rate (real); TPR = true-positive rate (fake).  
- ⚠️ marks severe class bias: the GenImage model has very low TPR (misses most fakes) while the Semi-Truths model has very low TNR (many false positives).  
- Training on **OpenFake** yields the most balanced generalization on this curated in-the-wild set.

---

## 🧠 Citation

```bibtex

````

---

## 🛡️ License
# ICLR-OpenFake
