A Unifying Framework for Parallelizing Sequential Models with Linear Dynamical Systems

A Unifying Framework for Parallelizing Sequential Models with Linear Dynamical Systems

TMLR Paper5999 Authors

25 Sept 2025 (modified: 20 Dec 2025)Decision pending for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Harnessing parallelism in seemingly sequential models is a central challenge for modern machine learning. Several approaches have been proposed for evaluating sequential processes in parallel using fixed-point methods, like Newton, Picard, and Jacobi iterations. In this work, we show that these methods can be understood within a common framework based on linear dynamical systems (LDSs), where different iteration schemes arise naturally as approximate linearizations of a nonlinear recursion. This unifying view highlights shared principles behind these techniques and clarifies when particular fixed-point methods are most likely to be effective. By bridging diverse algorithms through the language of LDSs, our framework provides a clearer theoretical foundation for parallelizing sequential models and points toward new opportunities for efficient and scalable computation.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: We thank the reviewers for their time and helpful feedback! Our paper presents a unifying framework for parallelizing nonlinear recursions with linear dynamical systems (LDSs). We show that major fixed-point iteration families like Jacobi, Picard, and Newton can all be written as LDSs, which can be parallelized over the sequence length with parallel scans. This contribution brings together advances from across disparate fields, including sequence modeling with recurrent neural networks (RNNS), sampling from diffusions, and more, unifying them in a common language to serve as a foundation for future work. We are heartened by the positive response to our unifying framework. Reviewers described our proposed framework as “novel,” “clean,” “effective,” and “elegant.” We appreciate that all reviewers answered “Yes” to whether TMLR’s audience would be interested in knowing the findings of the paper. Two of the three reviewers also responded “Yes” to whether our claims are supported by convincing evidence. We believe the only reviewer who responded “No” did so because of a misunderstanding; we have used that feedback to further clarify and strengthen the submission. ## Revisions thanks to feedback We also appreciate the helpful feedback from reviewers. We have incorporated this feedback to strengthen the paper. We have submitted a revised version of the paper, with all changes indicated in red. We highlight the major revisions below. ### Discussion of the parallel scan in main text The parallel scan is a fundamental ingredient of our unifying framework, as it is a powerful primitive for parallelizing LDSs. For this reason, we wrote an extensive introduction to it in Appendix A. Nonetheless, two reviewers also recommended that we add a brief description in main text. We have done so (see page 3). ### Clarification that each fixed point iteration is a linear dynamical system (LDS) Reviewer LPFg was confused how Eq. (4) was an LDS. We added a further algebraic manipulation to provide in Eq. (5) a more standard depiction of an LDS, as $ x_{t+1}^{(i+1)} = A_{t+1} x_t^{(i+1)} + b_{t+1}.$ Crucially, fixed point iteration $(i+1)$ takes us from $x^{(i)}$ (already known for all time) to $x^{(i+1)}$. The dynamics matrix $A_{t+1}$ and bias $b_{t+1}$ can be written as functions of $x_t^{(i)}$, which is already known. ### Strengthened experiments Two reviewers asked for additional experiments to further demonstrate our unifying framework. We particularly appreciated the recommendation from Reviewer N66A to use a controlled experiment to rigorously analyze the difference between the algorithms by setting the discretization time-step $\epsilon$ to various values to toggle the Jacobian matrix from more diagonalized to dense. We added precisely this experiment (See Figure 8). We showed that for step size $\epsilon=1e-5$ (the smallest considered, Jacobian most diagonal) that Newton and Picard both converged in a small number of fixed-point iterations, with Picard therefore running faster overall. In contrast, for $\epsilon=1e-3$ (the largest considered, Jacobian closer to dense), Newton continued to converge in a small number of iterations whereas Picard required considerably more iterations to converge. In this setting, Newton remained faster than sequential evaluation on wallclock time, whereas Picard did not. This experiment further corroborates our guidance as to how Jacobian structure affects the algorithmic choice ## Conclusion Thanks to the reviewer feedback, we have strengthened the paper with clearer exposition and additional experiments that further buttress our claims. Our unifying framework is correct and applicable to many settings of interest. We hope that its publication in TMLR will serve the community by providing a common language for understanding fixed-point iterations for parallelizing nonlinear recursions across many fields.

Assigned Action Editor: ~Yaoliang_Yu1

Submission Number: 5999

Loading