High-Throughput Fixed-Point Object Detection on FPGAs

Xiaoyin Ma, Walid A. Najjar, Amit K. Roy-Chowdhury

2014 (modified: 22 Sept 2022)FCCM 2014Readers: Everyone

Abstract: Computer vision applications make extensive use of floating-point number representation, both single and double precision. The major advantage of floating-point representation is the very large range of values that can be represented with a limited number of bits. Most CPU, and all GPU designs have been extensively optimized for short latency and high-throughput processing of floating-point operations. On an FPGA, the bit-width of operands is a major determinant of its resource utilization, the achievable clock frequency and hence its throughput. By using a fixed-point representation with fewer bits, an application developer could implement more processing units and a higher-clock frequency and a dramatically larger throughput. However, smaller bit-widths may lead to inaccurate or incorrect results. Object and human detection are fundamental problems in computer vision and a very active research area. In these applications a high throughput and an economy of resources are highly desirable features allowing the applications to be embedded in mobile or fielddeployable equipment. The Histogram of Oriented Gradients (HOG) algorithm [1], developed for human detection and expanded to object detection, is one of the most successful and popular algorithm in its class. In this algorithm, object descriptors are extracted from detection window with grids of overlapping blocks. Each block is divided into cells in which histograms of intensity gradients are collected as HOG features. Vectors of histograms are normalized and passed to a Support Vector Machine (SVM) classifier to recognize a person or an object.

0 Replies