Evaluation and Acceleration of High-Throughput Fixed-Point Object Detection on FPGAs
Reliance on object or people detection is rapidly growing beyond surveillance to industrial and social applications. The histogram of oriented gradients (HOG), one of the most popular object detection algorithms, achieves high detection accuracy but delivers just under 1 frame/s on a high-end CPU. Field-programmable gate array (FPGA) accelerations of this algorithm are limited by the intensive floating-point computations. All current fixed-point HOG implementations use large bit width to maintain detection accuracy, or perform poorly at reduced data precision. In this paper, we introduce the full-image evaluation methodology to explore the FPGA implementation of HOG using reduced bit width. This approach lessens the required area resources on the FPGA, and increases the clock frequency and hence the throughput per device through increased parallelism.
We evaluate the detection accuracy of the fixed-point HOG by applying state-of-the-art computer vision pedestrian detection evaluation metrics and show it performs as well as the original floating-point code from OpenCV. We then show our single FPGA implementation achieves a 68.7 × higher throughput than a highend CPU, 5.1 × higher than a high-end graphics processing unit (GPU), and 7.8 × higher than the same implementation using floating-point on the same FPGA. A power consumption comparison for different platforms shows our fixed-point FPGA implementation uses 130 × less power than CPU, and 31 × less energy than GPU to process one image.