- Keywords: hls4ml, machine learning, neural networks, tinyML, FPGA, ASIC, low-power, low-latency
- TL;DR: We present hls4ml, an open-source software-hardware co-design workflow to translate machine learning algorithms for implementation in FPGAs and ASICs to support science.
- Abstract: Accessible machine learning algorithms, software, and diagnostic tools for energy-efficient devices and systems are extremely valuable across a broad range of application domains. In scientific domains, real-time near-sensor processing can drastically improve experimental design and accelerate scientific discoveries. We have developed hls4ml, an open-source software-hardware co-design workflow to interpret and translate machine learning algorithms for implementation in FPGAs and ASICs specifically to support domain scientists. In this paper, we describe the essential features of the hls4ml workflow including network optimization techniques---such as pruning and quantization-aware training---which can be incorporated naturally into the device implementations. We expand on previous hls4ml work by extending capabilities and techniques towards low-power implementations and increased usability: new Python APIs, quantization-aware pruning, end-to-end FPGA workflows, long pipeline kernels for low power, and new device backends include an ASIC workflow. Taken together, these and continued efforts in hls4ml will arm a new generation of domain scientists with accessible, efficient, and powerful tools for machine-learning-accelerated discovery.