Content area
Abstract
This thesis augments and extends the state-of-the-art CNN inference accelerator for FPGAs, HPIPE. We first focus on the infrastructure of the accelerator, where we build an extra hardware unit to implement the Sigmoid function and automated unit tests to validate the functionality of the accelerator. We then study how to leverage the AI-optimized Stratix 10 NX FPGAs to achieve up to 7X speedup for convolution operations. Next, we extend HPIPE by integrating it with a hardware-friendly non-maximum suppression (NMS) unit to accelerate object detection and provide the highest-performing single-shot detection-based (SSD-based) object detection accelerator for FPGAs. Finally, we build an automated CAD flow to partition CNNs across multiple FPGAs that communicate via 100 Gb Ethernet. We show through a prototype system that doubling the number of FPGAs results in 2X performance improvement on three CNNs: MobileNet-V1, MobileNet-V2, and ResNet-50.





