Tensor Train-Based Multiple Clusterings for Big Data in Cyber-Physical-Social Systems and Its Efficient Implementations
Abstract: Multiple clusterings are conducive to discovering different data patterns hidden in data from different perspectives, so it has tremendous value in applications like community detection, resource recommendation, and gene expression, etc. To solve the problem that the existing multiple clustering approaches are mainly oriented to low-dimensional single-domain data and are not suitable for Big Data in Cyber-Physical-Social Systems (CPSS), a tensor-based multiple clustering (TMC) was proposed. However, as the scale of data continues to increase, data storage, computing load, and memory overhead will increase exponentially, leading to dimensional disasters and greatly affecting the efficiency of TMC. Therefore, a tensor train-based multiple clustering (TTMC) and its parallel computing method are studied in this paper. First, a tensor train (TT)-based multiple clustering parallel analytic and service framework is present. Then, a TT-based multi-linear attribute combination weight learning algorithm, a selective weighted tensor train distance, and the TTMC algorithm are put forward to improve the accuracy and efficiency of TMC. Furthermore, an efficient distributed parallel computing strategy of TTMC is designed by using TT core parallelism. Experimental results demonstrate that TTMC and its parallelization can significantly improve computation efficiency and clustering accuracy while reducing the running memory compared to the original TMC algorithm.
Loading