Keywords: Big data, Data averaging, Order statistic, Sampling method, Sketching method.
TL;DR: We propose a new sketching method for large scale linear model based on data averaging, which can achieve a faster convergence rate than the optimal convergence rate for sampling methods.
Abstract: This work is concerned with the estimation problem of linear model when the
sample size is extremely large and the data dimension can vary with the sample
size. In this setting, the least square estimator based on the full data is not feasible
with limited computational resources. Many existing methods for this problem are
based on the sketching technique which uses the sketched data to perform least
square estimation. We derive fine-grained lower bounds of the conditional mean
squared error for sketching methods. For sampling methods, our lower bound
provides an attainable optimal convergence rate. Our result implies that when the
dimension is large, there is hardly a sampling method can have a faster convergence
rate than the uniform sampling method. To achieve a better statistical performance,
we propose a new sketching method based on data averaging. The proposed
method reduces the original data to a few averaged observations. These averaged
observations still satisfy the linear model and are used to estimate the regression
coefficients. The asymptotic behavior of the proposed estimation procedure is
studied. Our theoretical results show that the proposed method can achieve a
faster convergence rate than the optimal convergence rate for sampling methods.
Theoretical and numerical results show that the proposed estimator has good
statistical performance as well as low computational cost.
Supplementary Material: zip
Submission Number: 12778
Loading