CPCA: A Feature Semantics Based Crowd Dimension Reduction FrameworkDownload PDFOpen Website

2018 (modified: 17 Apr 2023)IEEE Access 2018Readers: Everyone
Abstract: Dimension reduction plays an important role in practical big data analysis and data mining applications. However, popular dimension reduction techniques, such as principal component analysis (PCA), are known to be computation-intensive and are considered as a computation bottleneck for data processing and mining. In this paper, we propose to reduce the computation of PCA via crowdsourcing, a paradigm that accomplishes hard-to-compute problems leveraging collective intelligence. We design CPCA, crowd principal component analysis, a novel crowd-based dimension reduction framework. The CPCA designs tasks for crowd workers to obtain the relations among features based on their semantics and formulates a weighted graph from the collected answers to derive the covariance matrix and the principal components. We prove the correctness of CPCA and conduct extensive evaluations on real datasets. Experimental results show that CPCA could achieve significantly reduction on the computational cost in terms of both time and memory, which lowers the bar for learning.
0 Replies

Loading