Abstract: Throughput is a key performance metric for streaming FFT architectures. However, increasing spatial parallelism to improve throughput introduces complex routing, thus resulting in high power consumption. In this paper, we propose a high throughput energy efficient parallel FFT architecture based on Cooley-Tukey algorithm. Multiple pipeline FFT processors using time-multiplexing are utilized to perform FFT computation tasks in parallel. This design realizes high performance using task-level parallelism and avoids complex routing. Furthermore, to reduce the memory power consumption, a periodic memory activation (PMA) scheme is developed. By analyzing energy efficiency (defined as GOPS/Joule) asymptotically, we show that our design achieves a low energy efficiency complexity while satisfying a high-throughput requirement. For N-point FFT (64 ≤ N ≤ 4096), our proposed architecture achieves 50 ~ 63 GOPS/Joule, i.e., up to 78% of the Peak Energy Efficiency of FFT designs on FPGAs. Compared with a state-of-the-art design, our design improves the energy efficiency (defined as GOPS/Joule) by 17% to 26% with the same throughput.
Loading