PreVV: Eliminating Store Queue via Premature Value Validation for Dataflow Circuit on FPGA

Kuangjie Zou, Yifan Zhang, Zicheng Zhang, Guoyu Li, Jianli Chen, Kun Wang, Jun Yu

Published: 2025, Last Modified: 23 Jul 2025DATE 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Dynamic scheduling in high-level synthesis (HLS) maximizes pipeline performance by enabling out-of-order scheduling of load and store requests at runtime. However, this method introduces unpredictable memory dependencies, leading to data disambiguation challenges. Load-store queues (LSQs), commonly used in superscalar CPUs, offer a potential solution for HLS. How-ever, LSQs in dynamically scheduled HLS implementations often suffer from high resource overhead and scalability limitations. In this paper, we introduce PreVv, an architecture based on premature value validation designed to address memory disambiguation with minimal resource overhead. Our approach substitutes LSQ with several PreVv components and a straightforward premature queue. We prevent potential deadlocks by incorporating a specific tag that can send ‘fake’ tokens to prevent the accumulation of outdated data. Furthermore, we demonstrate that our design has scalability potential. We implement our design using several hardware templates and an LLVM pass to generate targeted dataflow circuits with PreVv. Experimental results on various benchmarks with data hazards show that, compared to state-of-the-art dynamic HLS, PreVV16 (a version with a premature queue depth of 16) reduces LUT usage by 43.91% and FF usage by 33.09%, with minimal impact on timing performance. Meanwhile, PreVV64 (a version with a premature queue depth of 64) reduces LUT usage by 27.21% and FF usage by 33.10%, without affecting timing performance.