Abstract: A problem for many kernel-based methods is that the amount of computation required to find the solution scales as O(n 3), where n is the number of training examples. We develop and analyze an algorithm to compute an easily-interpretable low-rank approximation to an n × n Gram matrix G such that computations of interest may be performed more rapidly. The approximation is of the form ${\tilde G}_{k} = CW^{+}_{k}C^{T}$ , where C is a matrix consisting of a small number c of columns of G and W k is the best rank-k approximation to W, the matrix formed by the intersection between those c columns of G and the corresponding c rows of G. An important aspect of the algorithm is the probability distribution used to randomly sample the columns; we will use a judiciously-chosen and data-dependent nonuniform probability distribution. Let || ·||2 and || ·|| F denote the spectral norm and the Frobenius norm, respectively, of a matrix, and let G k be the best rank-k approximation to G. We prove that by choosing O(k/ε 4) columns $${\left\|G - CW^{+}_{k}C^{T}\right\|_{\xi}} \leq \|G - G_{k}\|_{\xi} + \sum\limits_{i=1}^{n} G^{2}_{ii},$$ both in expectation and with high probability, for both ξ = 2,F, and for all k : 0 ≤ k ≤ rank(W). This approximation can be computed using O(n) additional space and time, after making two passes over the data from external storage.
0 Replies
Loading