Abstract: Particle Swarm Optimization (PSO) has been widely used in various optimization tasks (e.g., neural architecture search and autonomous vehicle navigation), because it can solve non-convex optimization problems with simplicity and efficacy. However, the PSO algorithm is often time-consuming to use, especially for high-dimensional problems, which hinders its applicability in time-critical applications. In this paper, we propose novel techniques to accelerate the PSO algorithm with GPUs. To mitigate the efficiency bottleneck, we formally model the PSO optimization as a process of element-wise operations on matrices. Based on the modeling, we develop an efficient GPU algorithm to perform the element-wise operations in massively parallel using the tensor cores and shared memory. Moreover, we propose a series of novel techniques to improve our proposed algorithm, including (i) GPU resource-aware thread creation to prevent creating too many threads when the number of particles/dimensions is large; (ii) designing parallel techniques to initialize swarm particles with fast random number generation; (iii) exploiting GPU memory caching to manage swarm information instead of allocating new memory and (iv) developing a schema to support customized swarm evaluation functions. We conduct extensive experiments on four optimization applications to study the efficiency of our algorithm called “FastPSO”. Experimental results show that FastPSO consistently outperforms the existing CPU-based PSO libraries by two orders of magnitude, and transcends the existing GPU-based implementation by 5 to 7 times, while achieving better or competitive optimization results.
0 Replies
Loading