Keywords: Carbon footprint, Diffusion Transformers, Efficient AI, Sustainable Machine Learning, DiT inference, Lifecycle carbon estimation, FLOP modeling, Hardware efficiency, Sustainable Machine Learning, Carbon-aware deployment
TL;DR: DiTCarbon is the first framework to predict lifecycle carbon for diffusion transformer inference from architecture and generation specifications before execution, covering class-conditional, text-conditional, dual-stream, and video DiT variants.
Abstract: Diffusion Transformers (DiTs) are moving into consumer image and video products, where high per-request inference cost makes carbon emissions a major deployment concern. Existing works either measure carbon at runtime or predict lifecycle carbon for autoregressive transformers. However, none estimates full-lifecycle carbon directly from a DiT architecture and generation configuration. We propose DiTCarbon, the first framework that combines parameter, FLOP, hardware efficiency, and lifecycle accounting models to predict carbon for class-conditional, text-conditional, dual-stream, and video DiT variants. Across 83 V100 configurations, hardware efficiency varies by 6.4$\times$. A pooled saturation fit predicts hardware efficiency within 7.8\% MAPE in the production regime (resolution $\geq 512^2$). Predicted operational carbon agrees with measured ground truth (from wall-clock runtime and NVML GPU power) within 10.7\% MAPE in the production regime. For a representative deployment, regional carbon varies $45\times$, making deployment region the largest lever for reducing per-output emissions as DiT services move into consumer products.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 466
Loading