APEX: One-Step High-Resolution Image Synthesis

ICLR 2026 Conference Submission482 Authors

01 Sept 2025 (modified: 29 Nov 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Diffusion, T2I
Abstract: The pursuit of efficient text-to-image synthesis has driven the field toward a few-step generation paradigm, yet this endeavor is hampered by a persistent trilemma: achieving high fidelity, inference efficiency, and training efficiency simultaneously remains elusive. Current approaches are often forced into a difficult trade-off. While methods employing external discriminators can produce high-fidelity one-step generations, they suffer from significant drawbacks, including training instability, high GPU memory costs, and slow convergence. Conversely, alternative paradigms like consistency distillation, though easier to train, often struggle to achieve high quality in one-step generation. These challenges have restricted the scalability and broader application of one-step generative models. In this work, we present APEX, a method that resolves this trilemma. The core innovation is a self-condition-shifting adversarial mechanism that completely obviates the need for an external discriminator. By eliminating this discriminator bottleneck, APEX achieves exceptional training efficiency and stability. This design makes it particularly well-suited for both full-parameter and LoRA-based tuning of large-scale generative models, offering a truly end-to-end solution. Experimentally, APEX demonstrates state-of-the-art (SOTA) performance, delivering high-fidelity synthesis with just a single function evaluation (NFE=1), yields a 15.33x speedup over the original Qwen-Image 20B. Our 0.6B model improves upon substantially larger models, such as FLUX Schnell 12B in few-step generation. We further showcase its efficiency by achieving a GenEval score of 0.89 on the Qwen-Image (original 50 NFE is 0.87) for 1 NFE, 20B model with LoRA tuning in just 6 hours. APEX effectively reshapes the trade-off between training cost, inference speed, and generation quality in large text-to-image generative models.
Primary Area: generative models
Submission Number: 482
Loading