Content area
The demand for low-cost, low-power edge devices capable of performing Artificial Intelligence (AI) workloads has been increasing in the last few years. Interest in pairing RISC-V, an open standard, royalty-free ISA built from the ground-up with customizability in mind, with specialized hardware, capable of performing the tasks they are designed for with exceptional efficiency, naturally begins to emerge, spawning multiple RISC-V based IPs. However, few seem interested in developing the compilers alongside their hardware, either due to requiring too big of an investment, steep learning curve, or other factors.
This thesis proposes an alternative: the introduction of a source-to-source compilation step right before compilation, allowing the automatic insertion of custom instructions directly into the source code using in-line assembly using a much more accessible API and ecosystem.
We discuss the details of automatically accelerating vector-vector dot products with the use of a MAC custom instruction as well as the necessary static analysis along the way. At the end of the day, we are able to find acceleration opportunities in third-party benchmarks. When running our program in an FPGA programmed with a closed-source IP we achieve a speedup of up to 7.1 times compared to the original, unoptimized program and matching the performance of manually optimized code.
Details
Machine learning;
Digital libraries;
Search engines;
Embedded systems;
Learning curves;
Science;
Artificial intelligence;
English language;
Power;
Optimization;
Neural networks;
Benchmarks;
Smart houses;
Software upgrading;
Literature reviews;
Algorithms;
Preprints;
Workloads;
Keywords;
Efficiency;
Web studies;
Computer engineering