Disappearance of Timestep Embedding: A Case Study on Neural ODE and Diffusion Models

TMLR Paper3920 Authors

09 Jan 2025 (modified: 31 Mar 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Dynamical systems are often time-varying, whose modeling requires a function that evolves with respect to time. Recent studies such as the neural ordinary differential equation proposed a time-dependent neural network, which provides a neural network varying with respect to time. However, we claim that the architectural choice to build a time-dependent neural network significantly affects its time-awareness but still lacks sufficient validation in its current states. In this study, we conduct an in-depth analysis of the architecture of neural ordinary differential equations. Here, we report a vulnerability of vanishing timestep embedding, which disables the time-awareness of a time-dependent neural network. Specifically, we find that the ConcatConv operation, which is widely used in neural ordinary differential equations, causes an additive effect of timestep embedding, which is readily canceled out by the subsequent batch normalization. This vanishing timestep embedding also arises for group normalization and is analyzed thoroughly with respect to the number of channels, groups, and relative variance. Furthermore, we find that this vulnerability can also be observed in diffusion models because they employ a similar architecture that incorporates timestep embedding to discriminate between different timesteps during a diffusion process. Our analysis provides a detailed description of this phenomenon as well as several solutions to address the root cause. Through experiments on neural ordinary differential equations and diffusion models, we observed that ensuring alive time-awareness via proposed solutions boosted their performance, such as classification accuracy, FID, and inception score, which implies that their current implementations lack sufficient time-dependency.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: In the revised manuscript, we reflected the comments provided by the reviewers, including several clarifications, additional theoretical and empirical analysis, and adding other experiments with different setups. In this version of the manuscript, the added and changed parts are marked with blue color.
Assigned Action Editor: ~Pierre_Ablin2
Submission Number: 3920
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview