Content area
In the age of hardware accelerators, increasing pressure is applied on computer architects and hardware engineers to improve the balance between the cost and benefits of specialized computing units, in contrast to more general-purpose architectures. The first part of this study presents the embedded Streaming Hardware Accelerator (eSAC) architecture. This architecture can reduce the idle time of specialized logic. The remainder of this paper explores the integration of an eSAC into a Central Processing Unit (CPU) core embedded inside a System-on-Chip (SoC) design, using the AXI-Stream protocol specification. The three evaluated architectures are the Tightly Coupled Streaming, Protocol Adapter FIFO, and Direct Memory Access (DMA) Streaming architectures. When comparing the tightly coupled architecture with the one including the DMA, the experiments in this paper show an almost 3× decrease in frame latency when using the DMA. Nevertheless, this comes at the price of an increase in FPGA resource utilization as follows: LUT (2.5×), LUTRAM (3×), FF (3.4×), and BRAM (1.2×). Four different test scenarios were run for the DMA architecture, showcasing the best and worst practices for data organization. The evaluation results highlight that poor data organization can lead to a more than 7× increase in latency. The CPU model was selected as the newly released MicroBlaze-V softcore processor. The designs presented herein successfully operate on a popular low-cost Field-Programmable Gate Array (FPGA) development board at 100 MHz. Block diagrams, FPGA resource utilization, and latency metrics are presented. Finally, based on the evaluation results, possible improvements are discussed.
Details
Central processing units--CPUs;
Random access memory;
Computer architecture;
Hardware;
Communication;
Microprocessors;
Signal processing;
Neural networks;
Network latency;
Design;
Architecture;
Embedded systems;
Transmitters;
Block diagrams;
First in-first out protocols;
Field programmable gate arrays;
Resource utilization;
System on chip;
Energy consumption;
Accelerators;
Interfaces
