Abstract: Vectorization has been commonly employed in modern processors. In this work we identify the opportunities to explore customized vector instructions and build an automatic compilation flow to efficiently identify those instructions. A composable vector unit (CVU) is proposed to support a large number of customized vector instructions with small area overhead. The results show that our approach achieves an average 1.41X speedup over the state-of-art vector ISA. We also observe a large area gain (around 11.6X) over the dedicated ASIC-based design.
0 Replies
Loading