Content area
Full text
Abstract - The coordinate rotation digital computer (CORDIC) algorithm is a popular method used in many fields of science and technology. Unfortunately, it is a time-consuming process for central processing units (CPUs) and graphics processing units (GPUs), and even for specialized digital signal processing (DSP) solutions. The CORDIC algorithm is an alternative for Newton-Raphson numerical calculation and for the FPGA based resource-expensive look-up-table (LUT) method. Various modifications of the CORDIC algorithm allow to speed up the operation of hardware in edge computing devices. With that context taken into consideration, this article presents a fast and accurate square root floating point (SQRT FP) CORDIC function which can be implemented in field programmable gate arrays (FPGAs). The proposed algorithm offers low-complexity, decent accuracy and speed, and is sufficient for digital signal processing (DSP) applications, such as digital filters, accelerators for neural networks, machine learning and computer vision applications, and intelligent robotic systems.
Keywords - computer vision, CORDIC algorithm, FPGA, numerical methods, reconfigurable computing systems
1. Introduction
The current methods by means of which the square root (SQRT) calculation approach is implemented in hardware continue to suffer from numerous drawbacks that limit their practical use. Software developers and researchers of many real-time DSP applications face challenges related to computational accuracy and speed [1], as well as optimization of hardware resources required to run square root algorithms [2]. In the Newton-Raphson numerical method that is commonly used for computing the square root, the precision level depends on the initial guess and requires significant computational resources, due to its reliance on iterative multiplication [3], [4].
Alternative multiplicative methods have a quadratic type of convergence, and thus may speed up the computation process. These methods perform a number of iterations of a fused multiply-add (FMA) operation, with the latency of a single FMA being in the range of 3 and 6 cycles [5]. For a low-precision SQRT computation, look-up table or low-degree polynomial approximation methods can be applied [1]. However, high demand for FPGA resources is an additional disadvantage here. This problem has been partially solved by iterative or digit-recurrence methods presented in [6], [7], which are characterized by linear convergence. In order to overcome the abovementioned issues, the coordinate rotation digital computer (CORDIC) algorithm was proposed [8]-[10].
The main...





