Optimizing Over All Sequences of Orthogonal Polynomials

Shiva Kaul

Optimizing Over All Sequences of Orthogonal Polynomials

Shiva Kaul

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Withdrawn SubmissionReaders: Everyone

Keywords: orthogonal polynomials, differentiation, compression, nonconvex optimization

Abstract: Every length-$(n+1)$ sequence of orthogonal polynomials is uniquely represented by two length-$(n+1)$ sequences of coefficients $\alpha$ and $\beta$. We make this representation learnable by gradient-based methods. Orthogonal polynomial operations may be automatically differentiated, but this uses $O(n^2)$ memory and is very slow in practice. By exploiting reversibility, we derive a differentiation algorithm which uses $O(n)$ memory and is much faster in practice. Using this algorithm, fixed polynomial transforms (e.g. discrete cosine transforms) can be replaced by learnable layers. These are more expressive, but they retain the computational efficiency and analytic tractability of orthogonal polynomials. As another application, we present an algorithm for approximating the minimal value $f(w^*)$ of a general nonconvex objective $f$, without finding the minimizer $w^*$. It follows a scheme recently proposed by Lasserre (2020), whose core algorithmic problem is to find the sequence of polynomials orthogonal to a given probability distribution. Despite the general intractability of this problem, we observe encouraging initial results on some test cases.

One-sentence Summary: Gradient descent over orthogonal polynomials.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Reviewed Version (pdf): https://openreview.net/references/pdf?id=7szfctGhWc

8 Replies

Loading