Automatically Bounding the Taylor Remainder Series

Joshua V. Dillon, Matthew Streeter

Published: 01 Aug 2023, Last Modified: 28 Jan 2026ArxivEveryoneCC BY 4.0

Abstract: Taylor polynomials play a central role in optimization and machine learning, in part because they can be easily computed using automatic differentiation. In many places where Taylor polynomials are used, it would be advantageous to also have a bound on the remainder series, but techniques for generating such bounds automatically are not as mature as automatic differentiation, and their potential has been less fully explored. In this work, we present a new algorithm for automatically bounding the Taylor remainder series. In the special case of a scalar function f : R → R, our algorithm takes as input a reference point x0, trust region [a, b], and integer k ≥ 1, and returns an interval I such that f(x)− Pk−1 i=0 1 i! f (i) (x0)(x−x0) i ∈ I(x−x0) k for all x ∈ [a, b]. As in automatic differentiation, the function f is provided to the algorithm in symbolic form, and must be composed of known atomic functions. At a high level, our algorithm has two steps: 1. For a variety of commonly-used functions (e.g., exp, log, relu, softplus), we use recently-developed theory [48] to derive sharp polynomial upper and lower bounds on the Taylor remainder series. 2. We recursively combine the bounds for the atomic functions using an interval arithmetic variant of Taylor-mode automatic differentiation. Our algorithm can make efficient use of machine learning hardware accelerators, and we provide an open source implementation in JAX.1 We then turn our attention to applications. Most notably, in a companion paper [47] we use our new machinery to create the first universal majorization-minimization optimization algorithms: algorithms that iteratively minimize an arbitrary loss using a majorizer that is derived automatically, rather than by hand. We also show that our automatically-derived bounds can be used for verified global optimization and numerical integration, and to prove sharper versions of Jensen’s inequality