A scalable sequential principal component analysis algorithm (SeqPCA) with application to user access control analysisDownload PDFOpen Website

Published: 2017, Last Modified: 17 May 2023IEEE BigData 2017Readers: Everyone
Abstract: Principal Component Analysis (PCA) is a powerful tool for data exploration and dimensionality reduction, and has broad applications in customer behavior and feedback mining. With the recent breakthroughs in big data technology, PCA becomes even more prevalent in large-scale data mining and business analytics, especially in online customer behavior analysis. However, with the rapidly growing volume of data sets, there also exist challenges when PCA is applied to these areas. For example, computing the principal components and determining the best number of components to extract under limited computational resources are two fundamental yet challenging tasks. In this article, we introduce an algorithm called Sequential PCA (SeqPCA), which is able to conduct PCA sequentially on large data sets. With this technique, data analysts can determine the optimal number of components to extract without recomputing PCA many times. This algorithm is applied to the user access control analysis of the internal websites of a large company, and numerical results show that the algorithm has superior performance and enables real-time analysis of large user behavior data.
0 Replies

Loading