TETRIS: On-Device Trainable Energy-Efficient FPGA Accelerator for Trustworthy and Real-Time Instance Segmentation
Abstract: Instance segmentation plays a critical role in high-precision computer vision applications, such as autonomous driving and medical image analysis. As demands for both model accuracy and privacy-preserving solutions continue to rise, on-device training is gaining traction for enabling secure, adaptive learning directly on edge devices. However, the intensive computation and complex dataflows inherent to instance segmentation models pose a major barrier to achieving real-time training in resource-constrained environments. While prior research on on-device training has focused largely on image classification, efficient hardware acceleration for instance segmentation training remains largely unexplored. In this work, we present TETRIS, the first FPGA-based hardware accelerator specifically tailored for training instance segmentation models. TETRIS introduces structural and quantization-aware optimizations, including channel-aware smoothing quantization and distribution-aware tie quantization, to significantly reduce both computational and memory overhead. Moreover, TETRIS adopts a heterogeneous hardware architecture, incorporating a reconfigurable convolution processing unit (RC-PU) that supports variable kernel sizes and a fully pipelined auxiliary processing unit to handle specialized operations efficiently. Experimental evaluation on YOLACT-based instance segmentation tasks demonstrates that TETRIS achieves a peak performance of 805.9 GOPS and an energy efficiency of 69.5 GOPS/W, confirming its ability to support real-time training even in resource-constrained edge environments.
External IDs:dblp:conf/iccad/LeePKLJPK25
Loading