LATTE: Latent Trajectory Embedding for Diffusion-Generated Image Detection

ICLR 2026 Conference Submission3419 Authors

09 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: detection of diffusion generated images; latent diffusion models;
TL;DR: We introduce LATTE, an approach that models the trajectory of latent representations across multiple denoising steps, to capture discriminative patterns that differentiate real from fake images.
Abstract: The rapid advancement of diffusion-based image generators has made it increasingly difficult to distinguish generated from real images. This erodes trust in digital media, making it critical to develop generated image detectors that remain reliable across different generators. While recent approaches leverage diffusion denoising cues, they typically rely on single-step reconstruction errors and overlook the sequential nature of the denoising process. In this work, we propose LATTE - Latent Trajectory Embedding - a novel approach that models the evolution of latent embeddings across multiple denoising steps. Instead of treating each denoising step in isolation, LATTE captures the trajectory of these representations, revealing subtle and discriminative patterns that distinguish real from generated images. Experiments on several benchmarks, such as GenImage, Chameleon, and Diffusion Forensics, show that LATTE achieves superior performance, especially in challenging cross-generator and cross-dataset scenarios, highlighting the potential of latent trajectory modeling.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 3419
Loading