Approximate Bayesian Estimation with Subsampled Logistic Regression in Big Data Settings

Published: 01 Jan 2021, Last Modified: 05 Feb 2025IEEE BigData 2021EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The computational limitations of enormously large datasets make statistical inference with big data challenging. In this paper, we propose a Bayesian approach that uses nonuniform subsampled data to derive the pseudo-posterior distribution of the parameters of interest Using the proposed subsampling strategies, we investigate a Jeffery’s prior. In addition, we develop a theoretical framework for analyzing the posterior convergence rate of the pseudo-posterior in subsampling, which relaxes the conditions of corresponding theoretical results from sample survey literature. Extensive simulation studies are carried out to confirm the valid estimation and computational efficiency of our proposed method. Finally, we analyze PUMS Data as illustrations of the effectiveness of our approach.
Loading