State Space Models are Comparable to Transformers in Estimating Functions with Dynamic Smoothness

Published: 18 Jun 2024, Last Modified: 09 Jul 2024TF2M 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: state space models, statistical learning theory
Abstract: While the capabilities of deep neural networks based on state space models (SSMs) have been primarily investigated through experimental comparisons, theoretical understanding is still limited. In particular, there is a lack of statistical and quantitative evaluation of whether SSMs can replace Transformers. In this paper, we theoretically explore in which tasks SSMs can be alternatives to Transformers from the perspective of estimating sequence-to-sequence functions. We consider the setting where the target function has direction-dependent smoothness, and prove that SSMs can estimate such functions with the same convergence rate as Transformers. Additionally, we prove that SSMs can estimate the target function as effectively as Transformers, even if the smoothness changes depending on the input sequence. Our results suggest that SSMs can replace Transformers when estimating the functions in certain classes that appear in practice.
Submission Number: 44
Loading