Content area
Field-programmable gate arrays (FPGAs) are a powerful resource for accelerating critical parts of application code, but their potential has not yet been fully realized. A major reason for this is that current approaches to FPGA acceleration are quite inflexible and difficult to use. Although recent work on High-Level Synthesis (HLS) has reduced the need for programmers to acquire specialized knowledge of FPGA internals, this has often come at the cost of significant toolchain-induced turnaround times and even execution performance penalties.
Recent research has worked towards making the use of FPGAs by non-specialists easier, for example by masking long compilation times with interpreted execution of Verilog code, and by avoiding the significant hardware reconfiguration delays that are intrinsic to many FPGA architectures through the use of overlays that nimbly emulate a software defined virtual FPGA architecture on top of the hardware-defined physical FPGA architecture. The latter approach also overcomes many of the constraints imposed by proprietary FPGA toolchains and encrypted configuration formats.
In this paper, we present PyJIT2FPGA, an approach that combines just-in-time (JIT) compilation with such a “virtual FPGA on physical FPGA” overlay to automatically accelerate loops in Python, resulting in a practical system for utilizing FPGA resources without the need for specialized knowledge about the underlying accelerator platform. Our combination of JIT compilation and overlays makes it possible to create, load onto the overlay, and commence execution of newly accelerated kernels from Python in mere milliseconds, many orders of magnitude faster than previous solutions. Once these kernels have been JIT-compiled and loaded onto the overlay, they are cached for repeated execution. The run-time performance of such accelerated kernels running on the FPGA overlay is up to 1280x faster than CPython 3.9 and up to 4.2x faster than statically pre-compiled C code.