Feedforward Mixing is as Sharp as it is Slow in Reverse

Published: 25 May 2026, Last Modified: 25 May 2026CTB@ICML 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: information propagation, minimax fidelity, mixing times, computational feedforward graphs
TL;DR: We show that backwards mixing time forms a tight upper bound for normalised minimax fidelity.
Abstract: Deep learning architectures across diverse domains—including language modeling, audio-visual processing, and temporal graph modeling—fundamentally rely on feedforward computational graphs. Consequently, optimising these graph topologies is a major research focus. Recent theoretical work evaluates the efficacy of these graphs using two key metrics: forward mixing time $T_{fwd}(G)$ (the speed of information propagation) and normalised minimax fidelity $F(G)$ (the sharpness of information representation). In this work, we prove that, surprisingly, minimax fidelity is tightly upper-bounded by the \textit{backward} mixing time of computational graphs: $F(G) \le \frac{8}{3} \cdot T_{bwd}(G)$ for all feedforward graphs with a unique source and sink. Practically, this establishes a direct tradeoff, $F(G) \le \frac{8}{3} \cdot T_{fwd}(G)$, for symmetric graphs where $T_{fwd}(G) = T_{bwd}(G)$. In the uniform setting, because most common architectures—including fully-connected and sliding-window graphs—are symmetric, this exposes an unavoidable architectural limit: optimising for rapid information mixing necessitates a degradation in fidelity, and vice versa. More generally, i.e. if $T_{bwd}(G) = f(T_{fwd}(G))$ and consequently $F(G) \le f(T_{fwd}(G))$ for some function $f$, we show that $f$ is well-behaved and at most an $O(n)$ factor of $T_{fwd}(G)$ for both uniform and non-uniform graphs, where $n$ is the number of nodes in $G$. Our results, ultimately, establish backward mixing time as a key metric for evaluating information propagation in deep learning models.
Paper Type: Long (8 pages)
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 70
Loading