Solving Tensor Low Cycle Rank Approximation

Published: 01 Jan 2023, Last Modified: 29 Jan 2025IEEE Big Data 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Large language models have become ubiquitous in modern life, finding applications in various domains such as natural language processing, language translation, and speech recognition. Recently, a breakthrough work [Zhao, Panigrahi, Ge, and Arora Arxiv 2023] explains the attention model from probabilistic context-free grammar (PCFG). One of the central computation task for computing probability in PCFG is formulating a particular tensor low rank approximation problem, we can call it tensor cycle rank. Given an $n\times n\times n$ third order tensor A, we say that A has cycle rank-k if there exists three $n\times k^{2}$ size matrices $U, V$, and W such that for each entry in each \begin{equation*}A_{a,b,c}=\sum_{i=1J^{=1}}^{k}\sum_{\prime}^{k}\sum_{l=1}^{k}U_{a,i+k\left(J-1\right)}\prime\otimes V_{b_{J}'+k\left(l-1\right)}\otimes W_{c,l+k\left(i-1\right)}\end{equation*}for all $a\in[n], b\in[n], c\in[n]$. For the tensor classical rank, tucker rank and train rank, it has been well studied in [Song, Woodruff, Zhong SODA 2019]. In this paper, we generalize the previous “rotation and sketch” technique in [Song, Woodruff, Zhong SODA 2019] and show an input sparsity time algorithm for cycle rank.
Loading