Vapnik-Chervonenkis Dimension of Recurrent Neural Networks

Pascal Koiran; Eduardo D. Sontag

Vapnik-Chervonenkis Dimension of Recurrent Neural Networks

Pascal Koiran, Eduardo D. Sontag

Published: 01 Jan 1998, Last Modified: 15 May 2025Discret. Appl. Math. 1998EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Most of the work on the Vapnik-Chervonenkis dimension of neural networks has been focused on feedforward networks. However, recurrent networks are also widely used in learning applications, in particular when time is a relevant parameter. This paper provides lower and upper bounds for the VC dimension of such networks. Several types of activation functions are discussed, including threshold, polynomial, piecewise-polynomial and sigmoidal functions. The bounds depend on two independent parameters: the number w of weights in the network, and the length k of the input sequence. In contrast, for feedforward networks, VC dimension bounds can be expressed as a function of w only. An important difference between recurrent and feedforward nets is that a fixed recurrent net can receive inputs of arbitrary length. Therefore we are particularly interested in the case k ⪢ w. Ignoring multiplicative constants, the main results say roughly the following: •• For architectures with activation (σ = any fixed nonlinear polynomial, the VC dimension is ≈ wk.•• For architectures with activation σ = any fixed piecewise polynomial, the VC dimension is between wk and w2k.•• For architectures with activation σ=H<math><mtext>σ=</mtext><mtext>H</mtext></math> (threshold nets), the VC dimension is between w log(kw)<math><mtext>w log(</mtext><mtext>k</mtext><mtext>w</mtext><mtext>)</mtext></math> and min{wk log wk, w2 + w log wk}.•• For the standard sigmoid σ(x)=1(1 + e−x)<math><mtext>σ(x)= </mtext><mtext>1</mtext><mtext>(1 + e</mtext><msup><mi></mi><mn>−x</mn></msup><mtext>)</mtext></math>, the VC dimension is between wk and w4k2.An earlier version of this paper has appeared in Proc. 3rd European Workshop on Computational Learning Theory, Lecture Notes in Computer Science vol. 1208, Springer, Berlin, 1997, pp. 223–237.

Loading