Implementing dictionary learning in Apache Flink, Or: How I learned to relax and love iterations

Geoffrey Mon; Milad Makkie; Xiang Li; Tianming Liu; Shannon Quinn

Implementing dictionary learning in Apache Flink, Or: How I learned to relax and love iterations

Geoffrey Mon, Milad Makkie, Xiang Li, Tianming Liu, Shannon Quinn

Published: 01 Jan 2016, Last Modified: 19 Feb 2025IEEE BigData 2016EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The authors evaluate the use of Apache Flink, a novel data analysis framework offering optimizations over competitors such as Apache Spark, in order to use a rank-1 dictionary learning (r1DL) algorithm to decompose fMRI data. We first expand the functionality of the Flink Python API in order to accommodate the implementation of rank-1 dictionary learning, a model for decomposing a large matrix. Iterative algorithms, aggregators, and other features are added to the incomplete Python API, and the experiences and lessons learned are described. Using these features, we port an existing implementation of r1DL from using the Python API of Apache Spark to using the Python API of Apache Flink. In preliminary testing, this implementation suggests performance boosts over Spark for large input files, meriting further research. We conclude that Flink is likely a feasible tool for the application of dictionary learning to decompose fMRI data, and we continue to evaluate and apply it.

Loading