Optimal Convergence Rates for the Orthogonal Greedy AlgorithmDownload PDFOpen Website

Published: 01 Jan 2022, Last Modified: 12 May 2023IEEE Trans. Inf. Theory 2022Readers: Everyone
Abstract: We analyze the orthogonal greedy algorithm when applied to dictionaries <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\mathbb {D}$ </tex-math></inline-formula> whose convex hull has small entropy. We show that if the metric entropy of the convex hull of <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\mathbb {D}$ </tex-math></inline-formula> decays at a rate of <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$O\left({n^{-\frac {1}{2}-\alpha }}\right)$ </tex-math></inline-formula> for <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\alpha &gt; 0$ </tex-math></inline-formula> , then the orthogonal greedy algorithm converges at the same rate on the variation space of <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\mathbb {D}$ </tex-math></inline-formula> . This improves upon the well-known <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$O\left({n^{-\frac {1}{2}}}\right)$ </tex-math></inline-formula> convergence rate of the orthogonal greedy algorithm in many cases, most notably for dictionaries corresponding to shallow neural networks. These results hold under no additional assumptions on the dictionary beyond the decay rate of the entropy of its convex hull. In addition, they are robust to noise in the target function and can be extended to convergence rates on the interpolation spaces of the variation norm. We show empirically that the predicted rates are obtained for the dictionary corresponding to shallow neural networks with Heaviside activation function in two dimensions. Finally, we show that these improved rates are sharp and prove a negative result showing that the iterates generated by the orthogonal greedy algorithm cannot in general be bounded in the variation norm of <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\mathbb {D}$ </tex-math></inline-formula> .
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview