(ProQuest: ... denotes non-US-ASCII text omitted.)
Leonid Moroz 1 and Shinobu Nagayama 2 and Taras Mykytiv 1 and Ihor Kirenko 3 and Taras Boretskyy 1
Academic Editor:Michael Hübner
1, Department of Security Information and Technology, Lviv Polytechnic National University, Lviv 79013, Ukraine
2, Department of Computer and Network Engineering, Hiroshima City University, Hiroshima 731-3194, Japan
3, Patient Care Solutions Department, Philips Research, High Tech Campus, 5656 AE Eindhoven, The Netherlands
Received 13 October 2013; Revised 10 March 2014; Accepted 20 March 2014; 24 April 2014
This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1. Introduction
COordinate Rotation DIgital Computer (CORDIC) is an effective method for the rotation vector in circular and hyperbolic coordinate systems. CORDIC was described by Jack Volder (J. Volder) in 1959. Since then, a great deal of work on this issue was published [1]. The method is popular now, due to its ease of software and hardware implementation. Indeed, for classical CORDIC method only a small amount of memory and some basic operations such as loading from memory, addition, subtraction, and shift are needed. The disadvantage of classical CORDIC is that the method has linear convergence. Thus, in order to get the m correct result fractional bits, one must hold all m iterations.
The comprehensive review of the modern state of theory and hardware representation of CORDIC method is provided in [1-3]. To achieve high performance there are attempts to build hybrid algorithms, such as LUT + CORDIC or CORDIC + final multipliers to boost performance [1, 4, 5]. However, there is no hybrid algorithm, which combines these three components: LUT, CORDIC, and final multipliers. Moreover, practical realization of such algorithm is unknown. An interesting method to increase productivity and reduce the average computation time was proposed in [6, 7], known as scaling-free CORDIC method. However, this approach has a narrow range of input angles, complex structure, and thus makes it impossible to design CORDIC calculators with accuracy of more than 16 correct bits.
In this paper we propose hybrid architecture for high performance with a simple structure that allows one to get more than 16 correct bits of the result.
2. General Approach
The known methods to improve performance of conventional [1, 2, 8, 9] and scaling-free CORDIC methods [6, 7, 10, 11] are tabular calculation method and the residual method of multiplication described in [4, 10, 12], respectively. Table method is used to process the most significant bits of the input angle (argument), while the remaining multiplication is for the least significant bits.
The first method is to create a precalculated table of values ......for the most significant bits of the computing function argument. In this method [10] proposes an approach based on the scaling-free CORDIC, which combines the functionalities of memory units (LUT) and the corresponding iterative processing. The algorithms [4, 12] compute sine and cosine functions, where the first iterations are performed by the classical CORDIC, and then output multipliers are applied, allowing significant savings in hardware resources compared to the classical approach described in [13]. We propose combining both described approaches. The combination of the memory units, iterative scaling-free CORDIC algorithms, and multipliers provides a method without deformation module of the vector, which significantly saves hardware resources in FPGA implementation and reduces delays. The theoretical basis for the proposed method and its 16-bit microcontroller implementation are described in [11, 14]. However, no hardware implementation on FPGA has been reported until now. This paper proposes the algorithm of the hybrid FPGA calculator of trigonometric sine and cosine functions.
3. Proposed Algorithm
We give an improved and expanded hybrid CORDIC algorithm [11, 14] shown as follows.
Let m be the given number of bits of the CORDIC, which determines the absolute error Δ m = 2 - m for calculating sine and cosine functions. We assume that the input angle [straight phi] is in the range 0 - π / 4 (the range of values [straight phi] will be discussed below) and is represented as [figure omitted; refer to PDF] where a i are the coefficients of the binary representation of an [straight phi] angle.
To construct the sine-cosine generator, we use three elements--LUT, simple scaling-free CORDIC stages, and output multipliers. Accordingly, the input angle [straight phi] is divided into three groups, which are processed sequentially using LUT, scaling-free CORDIC, and output multipliers.
Therefore [figure omitted; refer to PDF]
We must set the numbers m 1 and m 2 to determine the size of each of these construction elements. By using m 1 and m 2 , the number of bits of each group can be written as i LUT = 1 , ... , m 1 ; i CORDIC = ( m 1 + 1 ) , ... , m 2 ; i Multiplier = ( m 2 + 1 ) , ... , m .
To do this, we first consider the possibility of using CORDIC method in this computational scheme. The scale factor for conventional CORDIC is 1 + 2 - 2 i and for the scaling-free CORDIC it is 1 + 2 - 4 i - 2 on every iteration i . An angle [straight epsilon] i = 2 - i is also updated by the values arctan [straight epsilon] i and arctan [ ( 2 [straight epsilon] i ) / ( 2 - [straight epsilon] i 2 ) ] accordingly [11, 14]. For a given value m , influence of the scale factor on the results of calculations for the conventional CORDICs is canceled when 1 + 2 - 2 i - 1 < Δ m , beginning from [figure omitted; refer to PDF] [1, 4], and for the scaling-free CORDIC it is canceled when 1 + 2 - 4 i - 2 - 1 < Δ m , beginning from [11] [figure omitted; refer to PDF] It is caused by the fact that the scale factor becomes equal to 1.
Similarly, influence of the angle for a conventional CORDIC also stops, when | arctan [straight epsilon] i - [straight epsilon] i | < Δ m , beginning from [figure omitted; refer to PDF] [5], and for the scaling-free CORDIC when | arctan [ ( 2 [straight epsilon] i ) / ( 2 - [straight epsilon] i 2 ) ] - [straight epsilon] i | < Δ m , beginning from [11] [figure omitted; refer to PDF] In these cases arctan [straight epsilon] i and arctan [ ( 2 [straight epsilon] i ) / ( 2 - [straight epsilon] i 2 ) ] are equal, [straight epsilon] i = 2 - i .
Hence, we determine that m 1 = ( i angle _ sf - 1 ) and m 2 = i module _ c . Thus, the number of bits for the first group (most significant bits) is defined as i LUT = 1 , ... , ( i angle _ sf - 1 ) , that for the second (middle) is i CORDIC = i angle _ sf , ... , ( i module _ c ) , and that for the third (least significant bits) is i Multiplier = ( i module _ c + 1 ) , ... , m .
Example 1.
Assume that m = 16 , i LUT = 1,2 , 3,4 ; i CORDIC = 5,6 , 7,8 ; and i Multiplier = 9,10,11,12,13,14,15,16 .
Examples of the distribution of iterations between LUT, scaling-free (SF) CORDIC, and multipliers for different number of digits shown in Table 1.
Table 1: Distribution of iterations for hybrid devices of different bits.
m | LUT | SF CORDIC | Multiplier |
16 | 1 , ... , 4 | 5 , ... , 8 | 9 , ... , 168 |
24 | 1 , ... , 7 | 8 , ... , 12 | 13 , ... , 24 |
32 | 1 , ... , 9 | 9 , ... , 16 | 17 , ... , 32 |
48 | 1 , ... , 15 | 16 , ... , 24 | 25 , ... , 48 |
Step 1 . Preprocessing (or range reduction). Calculation of a sine-cosine with any of angles [varphi] , which is in the range 0 - 2 π , can be reduced to the calculation of sine-cosine with angle [straight phi] , which is in the first octant of 0 - π / 4 . Such reduction of a range is allowed by the rule of 8-point symmetry of circle. Methods for transformation of the input angle [varphi] to the [straight phi] are well known, for example, [8, 10], so we will not discuss it here.
Step 2 . For m = 16 the angle [straight phi] is divided into the following three angles: [figure omitted; refer to PDF]
Step 3 . Value is selected from LUT: [figure omitted; refer to PDF]
and then the following iterations are made for i = 5,6 , 7 : [figure omitted; refer to PDF] For i = 8 [figure omitted; refer to PDF] In vector-matrix form, step 3 is as follows: [figure omitted; refer to PDF]
Step 4 . Performing the multiplication on the residual angle [straight phi] 3 [figure omitted; refer to PDF] or in the unfolded form [figure omitted; refer to PDF] Obviously from these equations, final multiplication can be realized on typical for CORDIC operations as shift-add.
Step 5 . Postprocessing. Computed sine-cosine with angle [straight phi] is necessary to be converted to sine-cosine with angle [varphi] by the rule of 8-point symmetry of circle (everything is reduced only to the appropriate definition of sign x 9 and y 9 ).
4. FPGA Implementation Results
To show the effectiveness of the proposed algorithm, we implemented steps 3 and 4 in the algorithm with the Altera Stratix III FPGA (EP3SL340F1517C2) using Quartus II version 9.0.
Figure 1 shows the architecture of the algorithm implemented on the FPGA. In this figure, LUT and CORDIC blocks realize step 3 in the algorithm, and the block labeled with "Shift Add" realizes step 4. To achieve high-throughput computation, the architecture is pipelined with 9 stages: 1 stage for the LUT block, 4 stages for the CORDIC block, and 4 stages for the Shift Add block.
Figure 1: Architecture of the hybrid scaling-free CORDIC algorithm.
[figure omitted; refer to PDF]
The LUT block is implemented by ROM with a 4-bit input and two 16-bit outputs. In the CORDIC block, each iteration of the algorithm from x 5 to x 8 is implemented as a pipeline stage by logic elements, and it has two 20-bit outputs. The Shift Add block is implemented by dedicated DSP blocks on the FPGA, which have 4 pipeline stages. The outputs of the Shift Add block have 28 bits, and they are rounded to 16 bits.
In the best-known FPGA implementation of the conventional CORDIC algorithm for high throughput, all iterations of the algorithm from x 1 to x 16 are unrolled, and each of them is implemented as a pipeline stage. Thus, such implementation method requires a long latency and many logic elements. On the other hand, our implementation of the hybrid CORDIC requires fewer logic elements and causes shorter latency due to fewer pipeline stages.
Table 2 shows FPGA implementation results of the conventional CORCIC algorithm and our hybrid CORDIC algorithm. As shown in this table, our hybrid algorithm is much more efficient than the conventional one in terms of hardware cost. Compared to the conventional CORDIC, the proposed algorithm has approximately the same bandwidth. However, there is a considerable gain in reduction of latency, as well as the number of logic elements. Moreover, the number of pipeline stages is also reduced. Therefore, the proposed method is easier to implement and uses less logic elements, while it has similar bandwidth. Although our hybrid algorithm requires not only logic elements but also DSP blocks, this is efficient for modern FPGAs. This is because most of modern FPGAs have DSP blocks and memory blocks, and a balanced use of them as well as logic elements is more efficient. Therefore, we can conclude that our hybrid algorithm is suitable for modern FPGA implementation.
Table 2: FPGA implementation results of conventional and hybrid algorithms.
| Conventional CORDIC | Hybrid CORDIC | Ratio (%) |
Pipeline stages | 16 | 9 | 56 |
Logic elements (ALUTs) | 984 | 372 | 38 |
DSP blocks | 0 | 8 | -- |
Latency (nsec.) | 50 | 29 | 58 |
Throughput (106 /sec) | 320 | 306 | 96 |
5. Concluding Remarks
This paper describes the theoretical bases and practical pipelined FPGA implementation of a new hybrid scaling-free CORDIC algorithm. Logical combination of three construction elements of modern FPGAs which are LUT, simple scaling-free CORDIC stages, and multipliers allowed a considerable improvement of calculation efficiency of sine and cosine functions without the loss of accuracy, compared to conventional CORDIC algorithms.
The proposed method allows the reduction of the FPGA hardware resources by more than 30%, while providing the same throughput and accuracy of calculations.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
[1] P. K. Meher, J. Valls, T.-B. Juang, K. Sridharan, K. Maharatna, "50 years of CORDIC: algorithms, architectures, and applications," IEEE Transactions on Circuits and Systems I: Regular Papers , vol. 56, no. 9, pp. 1893-1907, 2009.
[2] J.-M. Muller Elementary Functions: Algorithms and Implementation , Birkhauser, Boston, Mass, USA, 2006., 2nd.
[3] B. Lakshmi, A. S. Dhar, "CORDIC architectures: a survey," VLSI Design , vol. 2010, 2010.
[4] E. Antelo, J. Villalba, "Low latency pipelined circular CORDIC," in Proceedings of the 17th IEEE Symposium on Computer Arithmetic (ARITH-17 '05), pp. 280-287, usa, June 2005.
[5] S. Wang, V. Piuri, E. E. Swartzlander Jr., "Hybrid CORDIC algorithms," IEEE Transactions on Computers , vol. 46, no. 11, pp. 1202-1207, 1997.
[6] K. Maharatna, S. Banerjee, E. Grass, M. Krstic, A. Troya, "Modified virtually scaling-free adaptive CORDIC rotator algorithm and architecture," IEEE Transactions on Circuits and Systems for Video Technology , vol. 15, no. 11, pp. 1463-1473, 2005.
[7] F. J. Jaime, M. A. Sánchez, J. Hormigo, J. Villalba, E. L. Zapata, "Enhanced scaling-free CORDIC," IEEE Transactions on Circuits and Systems I: Regular Papers , vol. 57, no. 7, pp. 1654-1662, 2010.
[8] A. Madisetti, A. Y. Kwentus, A. N. Willson Jr., "100-MHz, 16-b, direct digital frequency synthesizer with a 100-dBc spurious-free dynamic range," IEEE Journal of Solid-State Circuits , vol. 34, no. 8, pp. 1034-1043, 1999.
[9] A. S. N. Mokhtar, M. B. I. Reaz, K. Chellappan, M. A. Mohd Ali, "Scaling free CORDIC algorithm implementation of sine and cosine function," in Proceedings of the World Congress on Engineering (WCE '13), vol. II, London, UK, July 2013.
[10] Y.-S. Juang, L.-T. Ko, J.-E. Chen, T.-Y. Sung, H.-C. Hsin, "Optimization and implementation of scaling-free CORDIC-based direct digital frequency synthesizer for body care area network systems," Computational and Mathematical Methods in Medicine , vol. 2012, 2012.
[11] L. Moroz Theory and hardware-software devices of iteration evaluation of functions [M.S. thesis] , Lviv Polytechnic National University, Lviv, Ukraine, 2013.
[12] F. de Dinechin, M. Istoan, G. Sergent, "Fixed-point trigonometric functions on FPGAs," in Proceedings of the 4th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies, Royaume-Uni, Edimburgh, UK, March 2013.
[13] XILINX LogiCORE IP CORDIC v6.0 (December 2013)
[14] L. Moroz, T. Mykytiv, M. Herasym, "Improved scaling-free CORDIC algorithm," in East-West Design & Test Symposium, pp. 1-5, September 2013.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Copyright © 2014 Leonid Moroz et al. Leonid Moroz et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
COordinate Rotation DIgital Computer (CORDIC) is an effective method that is used in digital signal processing applications for computing various trigonometric, hyperbolic, linear, and transcendental functions. This paper presents the theoretical basis and practical implementation of circular (sine-cosine) CORDIC-based generator. Synthesis results of this generator based on Altera Stratix III FPGA (EP3SL340F1517C2) using Quartus II version 9.0 show that the proposed hybrid FPGA architecture significantly reduces latency (42% reduction) with a small area overhead, compared to the conventional version. The proposed algorithm has been simulated for sine and cosine function evaluation, and it has been verified that the accuracy is comparable with conventional algorithm.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer