Keywords: FLaNN, expressiveness, attention, formal languages
TL;DR: Transformers can express highly nonlinear counting properties
Abstract: Counting properties (e.g. determining whether certain tokens occur more
than other tokens in a given input text) have played a significant role in
the study of expressiveness of transformers.
In this paper, we provide a formal
framework for investigating the counting power of transformers. We argue
that all existing results demonstrate transformers' expressivity only for
(semi-)linear counting properties, i.e., which are expressible as a
boolean combination of linear inequalities.
Our main result is that transformers can express counting properties that
are highly nonlinear. More precisely, we prove that transformers can
capture all semialgebraic counting properties, i.e., expressible as
a boolean combination of arbitrary multivariate polynomials (of any degree).
Among others, these generalize the counting properties that
can be captured by support vector machines via polynomial kernel in the
vector space model.
To complement this result, we exhibit a natural subclass of (softmax)
transformers that completely characterizes semialgebraic counting
properties.
Through connections with the
Hilbert's tenth problem, this expressivity of transformers also
yields a new undecidability result for analyzing an extremely simple
transformer model --- surprisingly with neither positional encodings
(i.e. NoPE-transformers) nor masking.
We also experimentally validate trainability of such counting
properties.
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 20452
Loading