Enabling factor analysis on thousand-subject neuroimaging datasets

Michael J. Anderson, Mihai Capota, Javier S. Turek, Xia Zhu, Theodore L. Willke, Yida Wang, Po-Hsuan Chen, Jeremy R. Manning, Peter J. Ramadge, Kenneth A. Norman

2016 (modified: 26 Aug 2022)IEEE BigData 2016Readers: Everyone

Abstract: The scale of functional magnetic resonance image data is rapidly increasing as large multi-subject datasets are becoming widely available and high-resolution scanners are adopted. The inherent low-dimensionality of the information in this data has led neuroscientists to consider factor analysis methods to extract and analyze the underlying brain activity. In this work, we consider two recent multi-subject factor analysis methods: the Shared Response Model and the Hierarchical Topographic Factor Analysis. We perform analytical, algorithmic, and code optimization to enable multi-node parallel implementations to scale. Single-node improvements result in 99χ and 2062x speedups on the two methods, and enables the processing of larger datasets. Our distributed implementations show strong scaling of 3.3x and 5.5χ respectively with 20 nodes on real datasets. We demonstrate weak scaling on a synthetic dataset with 1024 subjects, equivalent in size to the biggest fMRI dataset collected until now, on up to 1024 nodes and 32,768 cores.

0 Replies