Optimizing interconnection complexity for realizing fixed permutation in data and signal processing algorithms

Published: 2016, Last Modified: 30 Sept 2024FPL 2016EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In hardware implementation of several widely used data and signal processing algorithms, data permutations need to be performed between the consecutive computation stages consisting of parallel computational units. Recently, some highly data parallel streaming architectures for data permutation have been proposed to achieve high throughput. However, the interconnection complexity of these designs increases dramatically with the problem size and data parallelism. In this paper, we develop a hardware structure to perform data permutation with optimized interconnection complexity, denned as the interconnection area per throughput. We propose a novel design technique such that the required interconnection logic is highly reduced for realizing a fixed permutation on streaming data. Our experimental results show that the proposed design technique reduces interconnection complexity by 27.3% to 75.8%, and improves the throughput by 5.3%~129% and the energy efficiency by 1.2×~3.5× compared with the state-of-the-art.
Loading