An Efficient Design of DCT Approximation Based on

Full text

Turn on search term navigation

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1. Introduction

In the last years, marked researches have been made in many transform techniques like fast Fourier transform (FFT), discrete cosine transform (DCT), and discrete wavelet transform (DWT), which are extensively used in various digital signal processing (DSP) applications [1, 2]. FFT is an essential transform in DSP with applications in signal filtering, frequency analysis, and compression. DWT is a widely used time-frequency method for the analysis of nonstationary signals. The DCT has widely been exploited for real-life data compression. DCT is better than others in some applications like data compression. It has energy compaction and decorrelation properties which makes it very close to the Karhunen–Loeve Transform (KLT). Thus, the DCT is preferable for data compression applications. It is an essential conversion between time and frequency domains in various applications of speech and image processing, communication systems, and signal [3]. Therefore, it is used to map an image space into a frequency. DCT is extensively used in several image and video compression standards such as JPEG [4], MPEG-1 [5], MPEG-2 [6], H.261 [7], H.263 [8], and others [9, 10]. The implementation of the DCT algorithm is not efficient due to its floating-point calculations and complex loops. In fact, floating-point algorithms are slow in software and require more silicon in hardware implementation [11]. However, the DCT should be calculated in a very short time. In this context, in the last few years, a large number of DCT approximations have been proposed to decrease the complexity of this transform [12–14]. Indeed, the request for higher quality video has increased because of the enormous amount of electronic devices that process digital video in even higher resolutions. Thus, power optimization and area minimization are the two principal research areas in very large-scale integrated circuit (VLSI) design for embedded and handheld devices which employ various image processing algorithms. Up to now, complementary metal oxide semiconductor- (CMOS-) based VLSI technology is extensively used to improve the quality of image processing systems. However, traditional transistors cannot get much smaller than their current size, which causes a large impact on the speed, performance, and power consumption of future designs. The challenges created by this trend could be partially met by innovative technologies, proposed as alternatives to the classic CMOS. Presently, single electron transistor (SET), tunnel field-effect transistor (FET), carbon nanotube (CNT), and silicon nanowire transistor are being used as an alternative to conventional VLSI technology [15, 16]. Among them, quantum-dot cellular automata (QCA) is one of the most promising solutions to design ultra-low-power and very high-speed digital circuits [17, 18]. QCA technology offers a revolutionary approach to computing at the nanolevel. The use of QCA on the nanoscale has a promising future because of its ability to achieve high performance in terms of device density, clock frequency, and power consumption. In this focus, QCA offers potential advantages of ultra-low power dissipation. It is expected to achieve a very high device density of 1012 device/cm² and switching speeds of 10 ps and a power dissipation of 100 W/cm² [19]. Consequently, an efficient design of circuits based on this new technology would lead to the reduction of computational complexity and power consumption. These benefits can make the proposed QCA method useful for image processing applications applied on portable communication devices where low power consumption is demanded in today’s world. Recently, some efforts have been made towards the design of QCA logic circuits for image processing applications such as MAC operation [20], BinDCT [21], image steganography [22], morphological edge detection [23], thresholding [24], noise removal [25], and morphological erosion and dilation [26]. The above scenario motivates us to investigate a new low-power DCT architecture based on QCA technology.

In this paper, we first present an optimal structure of adder circuit using three inputs XOR gate and three inputs majority gate which is used to design an eight-bit ripple carry adder (RCA) circuit. Furthermore, an efficient QCA D flip-flop (DFF) circuit is designed, and then the PIPO shift register circuit is designed using this DFF circuit as the building block. The designed RCA and PIPO shift register are used to achieve QCA DCT architecture. Power dissipation of the proposed DCT design has been estimated. Reliability of the proposed QCA circuit has also been explored.

The remainder of this paper is organized as follows: Section 2 provides the background of DCT algorithm. Section 3 presents an overview of the QCA. Section 4 discusses the DCT power optimization by QCA technology. Section 5 shows the discussions and results of the proposed DCT architecture. Finally, conclusions are drawn in Section 6.

2. DCT Algorithm

The discrete cosine transform (DCT) plays a critical role in image and video compression due to its near-optimal decorrelation efficiency [3]. The DCT is similar to the discrete Fourier transform (DFT). It is used to compress both color and gray scale images. The main advantage of image transformation using DCT is the suppression of redundancy between neighbouring pixels. Indeed, DCT approximation with low bit rates and low computational complexity is preferred. In this area, significant research works have been devoted for reducing the computation complexity of DCT transform [13, 27–34]. In ref. [13], a low power DCT architecture is proposed. It requires only sixteen additions. It has lower computational complexity. Also, a low complexity orthogonal $8 * 8$ transform matrix for fast image compression is proposed in [33]. It requires only fourteen additions and two shift operations. A new matrix for DCT, which requires only 12 additions, is reported in [34]. It achieves a low power consumption while implementing in hardware. Besides, several studies have been carried out to improve the performance of the DCT module and then reduce the complexity of the treatment [35, 36]. Otherwise, power consumption presents a fundamental problem when designing embedded video applications. Furthermore, embedded and handheld devices face necessary issues related to energy constraints as a result of their sizes and weights. This truth stimulates designers to search for new solutions to grant low power consumption for video processing applications. QCA technology is motivated by its applications in low-power electronic design. It has attracted important attention. In this paper, we have used the digital architecture (Figure 1) proposed in [34]. It can be implemented quite easily using adders and Parallel-In Parallel-Out (PIPO) shift registers.

[figure omitted; refer to PDF]

3. QCA Fundamentals

The QCA approach, introduced in 1993 by Lent et al. [18], is able to replace devices based on field-effect transistor (FET) on nanoscale. Generally, QCA cells are classified into various types: metal islands, nanomagnetics, semiconductors, and molecular structures. In the QCA technology, data are transmitted through polarization based on binary information encoding in quantum-dot cells. This nanotechnology was conceived based on some of Landauer’s ideas regarding energy efficient and robust digital devices [37]. It consists of an array of cells. Each cell contains four quantum dots at the corner of a square which can hold a single electron per dot. Only two electrons diametrically opposite are injected into a cell due to Coulomb interaction [38]. Through Coulombic effects, two possible polarizations (labelled −1 and 1) can be shaped. These polarizations are represented by binary “0” and binary “1” as shown in Figures 2 and 3, which shows the propagation of logic “0” and logic “1”, respectively, from input to the output in QCA binary wires due to the Coulombic repulsion. Generally, in neighbouring cells, the coulombic interaction between electrons is used to implement many logic functions which are controlled by the clocking mechanism [39].

[figure omitted; refer to PDF]

3.1. Logic Gates

A majority and inverter gates are the fundamental logic gates in the QCA implementations which are composed of some QCA cells. Several types of inverter and majority gates are shown in Figure 4. In the inverter gate, the output is the inverse of the input. Furthermore, the majority gate acts as an AND gate and OR gate just by setting one input permanently to 0 or 1. It has a logical function that can be expressed by the following equation: $\begin{matrix} (1) & MV (a, b, c) = A B + B C + A C . \end{matrix}$

[figures omitted; refer to PDF]

3.2. QCA Clocking

The clocking system is an important factor for the dynamics of QCA. Its principal functions are the synchronization of data flows and the implementation of adiabatic cell operation which enable QCA circuits with high energy efficiency [40]. Generally, QCA clocking is presented with four different phases which are switch, hold, release, and relax as illustrated in Figure 5. During the switch phase, in which actual computations are occurred, the barriers are raised and a cell is affected by the polarization of its adjacent cells and a distinctive polarity is obtained. During the hold phase, the barriers are high and the polarization of the cell is retained. During the release phase, the barriers are lowered and the cell loses the polarity. During the relax phase, the cell is nonpolarized [41].

[figure omitted; refer to PDF]

3.3. Crossovers in QCA

In this field, two approaches are used to traverse two wires in QCA (multilayer crossovers and coplanar crossings). Multilayer QCA circuits consume huge less area than coplanar circuits. However, it may be expensive and difficult to manufacture. In this paper, we use the former crossover approach in designing our DCT architecture since the second technique yields high cost due to fabrication issue. It requires two cell types (regular and rotated cells) as shown in Figure 6(a). It has already been applied in several studies [37, 42].

[figures omitted; refer to PDF]

4. QCA Implementation of the DCT

In this section, we present a new DCT architecture based on QCA technology to mitigate the computational complexity and power consumption issues. This configuration is composed of two stages (stage 1 and stage 2). The submodules utilized in designing our DCT architecture are eight-bit adders and PIPO shift registers to store the results generated by these adders. Thus, reducing the number of cell count and area in these components will make more contribution to achieve low power.

4.1. Study of Stage 1

This stage is composed of eight 8-bit full adders and eight 8-bit PIPO shift registers.

4.1.1. Eight-Bit Adder

The adder circuit plays an important role in the arithmetic circuits. Recently, several attempts have been made to implement efficient adder circuits in the QCA technology [43–50]. Therefore, the XOR gate [51] can easily be used in the synthesis of adder designs. In this subsection, we propose a novel QCA adder circuit based on majority gates. The inputs are A, B, and C_in. The outputs are Carry-out (Cout) and Sum. The outputs for the full adder are, respectively, given by the following equations: $\begin{matrix} (2) & Carry = M (A, B, C_{in}), \\ (3) & Sum = {XOR}_{3} (A, B, C_{in}) . \end{matrix}$

The QCA layout for the proposed full adder is depicted in Figure 7. It consists of one majority gate and one three-input exclusive-OR gate. According to QCADesigner software (version 2.0.3), the design consists of 45 cells and covers an area of 0.04 μm². The proposed design provides correct outputs after a delay of two clock phases as depicted in the achieved simulation waveform in Figure 8. The eight-bit adder performs computing function of the proposed DCT architecture. Here, an eight-bit ripple carry adder can be constructed by cascading eight copies of the proposed full adder circuit in series (Figure 9(a)). In order to perform a correct addition in parallel, added cells may be applied to the inputs and outputs in different clock zones for circuit synchronization. The ripple carry adder (RCA) layout in size of eight bit is indicated in Figure 9(b). This design uses 526 cells in its structure which requires 9 clock phases to generate the final output.

[figure omitted; refer to PDF]

[figures omitted; refer to PDF]

4.1.2. QCA 8-Bit PIPO

In this subsection, the design of the proposed 8-bit PIPO shift register is explained. The basic building block of a PIPO shift register is the flip-flop, mainly a D-type flip-flop. Figure 10 illustrates the proposed QCA flip-flop. It can be built using majority and inverter gates. The logic equation of the D flip-flop is represented by the following equation: $\begin{matrix} (4) & Q_{(t)} = CLk . D + \bar{CLk} . Q_{(t - 1)}, \end{matrix}$ Here, the input “D” is only copied to the output “Q” when the clock input is active. The proposed design includes 42 cells with an area of 0.04 μm². It takes five clock periods for the inputs to reach the output and first meaningful output comes on sixth clock. Figure 11 presents the simulation results of the QCA D flip-flop.

[figures omitted; refer to PDF]

[figure omitted; refer to PDF]

Figures 12 and 13 show, respectively, the schematic and the QCA layout of the proposed eight-bit PIPO shift register. It consists of eight QCA D flip-flops which are connected together by a clock signal. Here, the input data are D0, D1, …, D7 which are parallally loaded into the register coincident. The outputs data of this design are Q0, Q1, …, Q7 which are parallally available at the output of each D flip-flop. The proposed QCA layout is composed of 407 cells with an area of 0.52 μm². It has a critical path length of 35 clock zones.

[figure omitted; refer to PDF]

4.2. Study of Stage 2

This stage is composed of eight 8-bit full adders and four 8-bit PIPO shift registers. The same full-adder and PIPO shift register proposed in the first stage have been used in this stage.

5. Results and Discussions

The implementation and the simulation of the proposed designs are achieved by using QCADesigner 2.0.3 tool [52]. Here, an investigation into these designs in semiconductor QCA technology is provided. The parameters used for the simulation are as follows: cell width = 18 nm, cell height = 18 nm, cell-to-cell spacing = 2 nm, dot diameter = 5 nm, number of samples = 12.800, convergence tolerance = 0.001, radius of effect = 80 nm, relative permittivity = 12.9, clock high = 9.8 E-22J, clock low = 3.8 E-23J, clock amplitude factor = 2, layer separation = 11.5 nm, and maximum iterations per sample = 100. The spacing between two wires is two cells wide and the cell count in one clock zone is two at least. In this design, the coplanar wire method has been used.

The comparison of the proposed QCA submodules with previously reported designs in terms of circuit complexity are shown in Tables 1–4, respectively.

Table 1

Comparison of the proposed adder with the previous works.

Circuit	Cell count	Area (μm²)	Clock no. cycle	Crossover type
Full adder [43]	135	0.14	1.25	Multilayer
Full adder [44]	93	0.087	1	Multilayer
Full adder [45]	73	0.080	0.75	Multilayer
Full adder [46]	220	0.36	3	Coplanar
Full adder [53]	206	0.28	2	Not required
Full adder [47]	102	0.097	2	Coplanar
Full adder [43]	59	0.043	1	Coplanar (clocking based)
Full adder [49]	49	0.04	1	Coplanar (clocking based)
Proposed adder	45	0.04	0.5	Coplanar

Table 2

Comparison of the proposed 8-bit adder with the previous works.

Circuit	Cell count	Area (μm²)	Clock no. cycle	Crossover type
Full adder [54]	1782	1.49	10	Multilayer
Full adder [47]	789	0.948	10	Multilayer
Full adder [55]	517	0.59	10	Multilayer
Full adder [48]	572	0.492	11	Coplanar
Proposed adder	526	0.89	3.5	Coplanar

Table 3

Comparison of the proposed D flip-flop with the previous works.

Circuit	Cell count	Area (μm²)	Clock no. cycle	Crossover type
DFF [56]	66	0.08	1.5	Coplanar
DFF [57]	49	0.05	1	Not required
DFF [58]	46	0.03	0.75	Not required
DFF [21]	46	0.05	1.25	Not required
Proposed adder	42	0.04	1.5	Not required

Table 4

Comparison of the proposed 8-bit PIPO with the previous works.

Circuit	Cell count	Area (μm²)	Clock no. cycle	Crossover type
PIPO [21]	562	0.74	N.A	Not required
Proposed adder	407	0.52	35	Not required

The proposed subcircuits of QCA DCT approximation have lower computational complexity and better performances compared to the existing ones. As shown in Table 1, the designed full adder has an improvement of 78%, 85%, and 75% in terms of cell complexity, extent, and delay, correspondingly, compared with the design in [53]. Compared with the design in [49], the proposed full adder has an advancement of 8.16% and 50% in terms of cell complexity and delay, respectively. Table 2 shows that the proposed design of the 8-bit adder has reduced 33% cell count, 5.3% area, and 65% delay as compared with the circuit in [47]. In addition, the cell count, area, and delay of the designed QCA D flip-flop are considerably improved compared to the QCA circuits in [21, 56–58], as listed in Table 3. Table 4 summarizes the comparative results, which indicates that the designed eight-bit PIPO exhibits considerable superiority over the existing in [21] in terms of cell count and area by 27% and 29%, respectively. So, the proposed submodules can directly contribute to the low power DCT design.

Since there is no electrical current in QCA computations, the power consumption of the proposed design is much lower than the classical-based solution. Here, we employed QCAPro software [59] in order to calculate the power dissipation of the proposed DCT design. The consumption of the entire system is valuing 0.091 mW. This value is considerably lower than that existing in the literature and based on CMOS technology [34, 60, 61]. According to Table 5, it is found that the proposed architecture involves nearly 53% less power dissipation than the presented one in [34]. Therefore, the proposed design can operate at a higher frequency (higher than 1 GHz) than the conventional solution. The performances gained indicate that the proposed module could be a good candidate for numerous video and image applications. Consequently, this architecture can be useful for future high-definition video applications. It enables meeting the real time constraints of the most recent high-resolution video formats.

Table 5

Comparison of the proposed DCT with the previous works.

Transform	Power (mW)
Transform in [60]	29.78
Transform in [61]	12.4
Transform in [34]	0.1954
Proposed transform	0.091

In this way, with the advances being made both in QCA technology and the ever-increasing computational requirements of image treatment, this work can clearly open up a new window of opportunity in this scope.

The effect of temperature variations on polarization of output cell in the proposed DCT design has been investigated. It is taken at different temperatures and the effect is depicted in Figure 14. According to this figure, it is clear that the DCT circuit works efficiently between 1 K and 6 K. Over 6 K, the output polariation drops dramatically and the design starts malfunctioning.

[figure omitted; refer to PDF]

6. Conclusion

Area minimization and low power are the two indispensable requirements for portable multimedia devices, which use several image processing algorithms. The QCA technology offers several advantages such as very low power dissipation, high functional density, and improved computing speed (in terahertz) and facilitates further miniaturisation in nanoscale. In this paper, a novel design of DCT approximation in the QCA technology has been presented. The proposed design consumes 0.091 mW power. The operating frequency of this architecture can exceed 1 THz. This work provides high circuit performance, very low power consumption and very low dimension as compared with traditional VLSI technology. The outcome of this work can clearly open up a new window of opportunity for low power video designs. Future extensions, such as various applications based on this QCA DCT, could be investigated.

References

[1] A. Gupta, S. D. Joshi, P. Singh, "On the approximate discrete KLT of fractional Brownian motion and applications," Journal of the Franklin Institute, vol. 355 no. 17, pp. 8989-9016, DOI: 10.1016/j.jfranklin.2018.09.023, 2018.

[2] P. Singh, "Novel Fourier quadrature transforms and analytic signal representations for nonlinear and non-stationary time-series analysis," Royal Society Open Science, vol. 5 no. 11,DOI: 10.1098/rsos.181131, 2018.

[3] N. Ahmed, T. Natarajan, K. R. Rao, "Discrete cosine transform," IEEE Transactions on Computers, vol. 23 no. 1, pp. 90-93, DOI: 10.1109/t-c.1974.223784, 1974.

[4] W. B. Pennebaker, J. L. Mitchell, JPEG Still Image Data Compression Standard, 1992.

[5] N. Roma, L. Sousa, "Efficient hybrid DCT-domain algorithm for video spatial downscaling," EURASIP Journal on Advances in Signal Processing, vol. 2007 no. 1,DOI: 10.1155/2007/57291, 2007.

[6] International Organisation for Standardisation, ISO/IEC JTC1/SC29/WG11: Generic Coding of Moving Pictures and Associated Audio Information—Part 2: Video, 1994.

[7] International Telecommunication Union, ITU-T Recommendation H. 261 Version 1: Video Codec for Audiovisual Services at P X 64 kbits, 1990.

[8] International Telecommunication Union, ITU-T Recommendation H. 263 Version 1: Video Coding for Low Bit Rate Communication, 1995.

[9] International Telecommunication Union, ITU-T Recommendation H. 264 Version 1: Advanced Video Coding for Generic Audio-Visual Services, 2003.

[10] T. Wiegand, G. J. Sullivan, G. Bjontegaard, A. Luthra, "Overview of the H.264/AVC video coding standard," IEEE Transactions on Circuits and Systems for Video Technology, vol. 13 no. 7, pp. 560-576, DOI: 10.1109/tcsvt.2003.815165, 2003.

[11] A. Turneo, M. Monchiero, G. Palermo, F. Ferrandi, D. Sciuto, "A pipelined fast 2D-DCT accelerator for FPGA-based SoCs," Proceedings of the IEEE Computer Society Annual Symposium on VLSI (ISVLSI ’07), pp. 331-336, DOI: 10.1109/isvlsi.2007.13, .

[12] R. J. Cintra, F. M. Bayer, "A DCT approximation for image compression," IEEE Signal Processing Letters, vol. 18 no. 10, pp. 579-582, DOI: 10.1109/LSP.2011.2163394, 2011.

[13] N. Brahimi, S. Bouguezel, "An efficient fast integer DCT transform for images compression with 16 additions only," pp. 71-74, DOI: 10.1109/wosspa.2011.5931415, .

[14] K. Lengwehasatit, A. Ortega, "Scalable variable complexity approximate forward DCT," IEEE Transactions on Circuits and Systems for Video Technology, vol. 14 no. 11, pp. 1236-1248, DOI: 10.1109/TCSVT.2004.835151, 2004.

[15] K. Bernstein, R. K. Cavin, W. Porod, A. Seabaugh, J. Welser, "Device and architecture outlook for beyond CMOS switches," Proceedings of the IEEE, vol. 98 no. 12, pp. 2169-2184, DOI: 10.1109/jproc.2010.2066530, 2010.

[16] D. Rairigh, Limits of Cmos Technology Scaling and Technologies Beyond-Cmos, 2006.

[17] G. L. Snider, A. O. Orlov, I. Amlani, "Quantum-dot cellular automata: review and recent experiments (invited)," Journal of Applied Physics, vol. 85 no. 8, pp. 4283-4285, DOI: 10.1063/1.370344, 1999.

[18] C. S. Lent, P. D. Tougaw, W. Porod, G. H. Bernstein, "Quantum cellular automata," Nanotechnology, vol. 4 no. 1, pp. 49-57, DOI: 10.1088/0957-4484/4/1/004, 1993.

[19] K. Walus, A. Vetteth, G. Jullien, V. Dimitrov, "Ram design using quantum-dot cellular automata," Proceedings of the Technical Proceedings of the 2003 Nanotechnology Conference and Trade Show, vol. 2, pp. 160-163, .

[20] G. Ismail, T. Lamjed, O. Bouraoui, "Design of efficient quantum-dot cellular automata (QCA) multiply accumulate (MAC) unit with power dissipation analysis," IET Circuits, Devices & Systems, vol. 13 no. 4, pp. 534-543, DOI: 10.1049/iet-cds.2018.5196, 2019.

[21] L. Touil, I. Gassoumi, R. Laajimi, B. Ouni, "Efficient design of BinDCT in quantum-dot cellular automata (QCA) technology," IET Image Processing, vol. 12 no. 6, pp. 1020-1030, DOI: 10.1049/iet-ipr.2017.1116, 2018.

[22] D. Bikash, C. D. Jadav, D. Debashis, "Reversible logic-based image steganography using quantum dot cellular automata for secure nanocommunication," IET Circuits, Devices & Systems, vol. 11 no. 1,DOI: 10.1049/iet-cds.2015.0245, 2017.

[23] O. Liolis, V. S. Kalogeiton, D. P. Papadopoulos, G. C. Sirakoulis, V. Mardiris, A. Gasteratos, "Morphological edge detector implemented in quantum cellular automata," Proceedings of the 2013 IEEE International Conference on Imaging Systems and Techniques (IST), pp. 406-409, DOI: 10.1109/ist.2013.6729731, .

[24] B. Sen, A. S. Anand, T. Adak, B. K. Sikdar, "Thresholding using quantum-dot cellular automata," pp. 356-360, DOI: 10.1109/innovations.2011.5893848, .

[25] P. Z. Qadir, S. J. Ahmad, M. A. Peer, "Quantum-dot cellular automata: theory and application," Proceedings of the 2013 International Conference on Machine Intelligence Research and Advancement, pp. 540-544, .

[26] V. Mardiris, V. Chatzis, Image Processing Algorithms Implementation Using Quantum Cellular Automata, 2014.

[27] M. N. Haggag, M. El-Sharkawy, G. Fahmy, "Efficient fast multiplication-free integer transformation for the 2-D DCT H.265 standard," Proceedings of the 2010 IEEE International Conference on Image Processing, pp. 3769-3772, DOI: 10.1109/icip.2010.5653484, .

[28] F. M. Bayer, U. S. Potluri, A. Madanayake, R. J. Cintra, "Multiplierless approximate 4-point DCT VLSI architectures for transform block coding," Electronics Letters, vol. 49 no. 24, pp. 1532-1534, DOI: 10.1049/el.2013.1352, 2013.

[29] K. A. Wahid, M. Martuza, M. Das, C. McCrosky, "Efficient hardware implementation of 8 × 8 integer cosine transforms for multiple video codecs," Journal of Real-Time Image Processing, vol. 8 no. 4, pp. 403-410, DOI: 10.1007/s11554-011-0209-6, 2013.

[30] P. K. Meher, S. Y. Park, B. K. Mohanty, K. S. Lim, C. Yeo, "Efficient integer DCT architectures for HEVC," IEEE Transactions on Circuits and Systems for Video Technology, vol. 24 no. 1, pp. 168-178, DOI: 10.1109/TCSVT.2013.2276862, 2014.

[31] F. M. Bayer, R. J. Cintra, A. Edirisuriya, A. Madanayake, "A digital hardware fast algorithm and FPGA-based prototype for a novel 16-point approximate DCT for image compression applications," Measurement Science and Technology, vol. 23 no. 11,DOI: 10.1088/0957-0233/23/11/114010, 2012.

[32] D. Vaithiyanathan, R. Seshasayanan, "Low power DCT architecture for image compression," Proceeding of the International Conference on Advanced Computing and Communication Systems (ICACCS),DOI: 10.1109/ICACCS.2013.6938745, .

[33] R. K. Senapati, U. C. Pati, K. K. Mahapatra, "A low complexity orthogonal 8 × 8 transform matrix for fast image compression," Proceeding of the Annual IEEE India Conference,DOI: 10.1109/indcon.2010.5712707, .

[34] V. Dhandapani, S. Ramachandran, "Area and power efficient DCT architecture for image compression," EURASIP Journal on Advances in Signal Processing, vol. 2014 no. 1,DOI: 10.1186/1687-6180-2014-180, 2014.

[35] M. Jridi, A. Alfalou, P. K. Meher, "A generalized algorithm and reconfigurable architecture for efficient and scalable orthogonal approximation of DCT," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 62 no. 2, pp. 449-457, DOI: 10.1109/tcsi.2014.2360763, 2015.

[36] S. Bouguezel, M. O. Ahmad, M. N. S. Swamy, "Binary discrete cosine and hartley transforms," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 60 no. 4, pp. 989-1002, DOI: 10.1109/tcsi.2012.2224751, 2013.

[37] C. S. Lent, G. L. Snider, "The development of quantum-dot cellular automata," Field-Coupled Nanocomputing: Paradigms, Progress, and Perspectives, 2014.

[38] P. D. Tougaw, C. S. Lent, "Logical devices implemented using quantum cellular automata," Journal of Applied Physics, vol. 75 no. 3, pp. 1818-1825, DOI: 10.1063/1.356375, 1994.

[39] C. S. Lent, B. Isaksen, "Clocked molecular quantum-dot cellular automata," IEEE Transactions on Electron Devices, vol. 50 no. 9, pp. 1890-1896, DOI: 10.1109/ted.2003.815857, 2003.

[40] K. Walus, G. A. Jullien, "Design tools for an emerging SoC technology: quantum-dot cellular automata," Proceedings of the IEEE, vol. 94 no. 6, pp. 1225-1244, DOI: 10.1109/jproc.2006.875791, 2006.

[41] C. S. Lent, M. Liu, Y. Lu, "Bennett clocking of quantum-dot cellular automata and the limits to binary logic scaling," Nanotechnology, vol. 17 no. 16, pp. 4240-4251, DOI: 10.1088/0957-4484/17/16/040, 2006.

[42] L. Lu, W. Liu, O. Neill, E. E. Swartzlander, "QCA systolic array design," IEEE Transactions on Computers, vol. 62 no. 3, pp. 548-560, DOI: 10.1109/tc.2011.234, 2013.

[43] H. Cho, E. E. Swartzlander, "Adder designs and analyses for quantum-dot cellular automata," IEEE Transactions On Nanotechnology, vol. 6 no. 3, pp. 374-383, DOI: 10.1109/tnano.2007.894839, 2007.

[44] R. Zhang, K. Walus, W. Wang, G. A. Jullien, "Performance comparison of quantumdot cellular automata adders," Proceedings of the 2005 IEEE International Symposium on Circuits and Systems, pp. 2522-2526, DOI: 10.1109/iscas.2005.1465139, .

[45] H. Cho, E. E. Swartzlander, "Adder and multiplier design in quantum-dot cellular automata," IEEE Transactions on Computers, vol. 58 no. 6, pp. 721-727, DOI: 10.1109/tc.2009.21, 2009.

[46] K. Kim, K. Wu, R. Karri, "The robust QCA adder designs using composable QCA building blocks," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 26 no. 1, pp. 176-183, DOI: 10.1109/tcad.2006.883921, 2007.

[47] I. Hänninen, J. Takala, "Binary adders on quantum-dot cellular automata," Journal of Signal Processing Systems, vol. 58 no. 1, pp. 87-103, DOI: 10.1007/s11265-008-0284-5, 2010.

[48] D. Abedi, G. Jaberipur, M. Sangsefidi, "Coplanar full adder in quantum-dot cellular automata via clock-zone-based crossover," IEEE Transactions on Nanotechnology, vol. 14 no. 3, pp. 497-504, DOI: 10.1109/tnano.2015.2409117, 2015.

[49] T. N. Sasamal, A. K. Singh, A. Mohan, "An optimal design of full adder based on 5-input majority gate in coplanar quantum-dot cellular automata," Optik, vol. 127 no. 20, pp. 8576-8591, DOI: 10.1016/j.ijleo.2016.06.034, 2016.

[50] G. Singh, B. Raj, R. K. Sarin, "Design and performance analysis of a new efficient coplanar quantum-dot cellular automata adder," Indian Journal of Pure & Applied Physics, vol. 55, pp. 97-103, 2017.

[51] G. Singh, R. K. Sarin, B. Raj, "A novel robust exclusive-or function implementation in QCA nanotechnology with energy dissipation analysis," Journal of Computational Electronics, vol. 15 no. 2, pp. 455-465, DOI: 10.1007/s10825-016-0804-7, 2016.

[52] K. Walus, T. J. Dysart, G. A. Jullien, R. A. Budiman, "QCADesigner: a rapid design and simulation tool for quantum-dot cellular automata," IEEE Transactions On Nanotechnology, vol. 3 no. 1, pp. 26-31, DOI: 10.1109/tnano.2003.820815, 2004.

[53] N. Kandasamy, F. Ahmad, N. Telagam, "Shannon logic based novel QCA full adder design with energy dissipation analysis," International Journal of Theoretical Physics, vol. 57 no. 12, pp. 3702-3715, DOI: 10.1007/s10773-018-3883-3, 2018.

[54] V. Pudi, K. Sridharan, "Low complexity design of ripple carry and brent-kung adders in QCA," IEEE Transactions on Nanotechnology, vol. 11 no. 1, pp. 105-119, DOI: 10.1109/tnano.2011.2158006, 2012.

[55] M. Mohammadi, M. Mohammadi, S. Gorgin, "An efficient design of full adder in quantum-dot cellular automata (QCA) technology," Microelectronics Journal, vol. 50, pp. 35-43, DOI: 10.1016/j.mejo.2016.02.004, 2016.

[56] A. Vetteth, K. Walus, V. S. Dimitrov, G. A. Jullien, Quantum-Dot Cellular Automata of Flip-Flops, 2003.

[57] S. Hashemi, K. Navi, "New robust QCA D flip flop and memory structures," Microelectronics Journal, vol. 43 no. 12, pp. 929-940, DOI: 10.1016/j.mejo.2012.10.007, 2012.

[58] A. Rezaei, H. Saharkhiz, "Design of low power random number generators for quantum-dot cellular automata," International Journal of Nano Dimension, vol. 7 no. 4, pp. 308-320, 2016.

[59] S. Srivastava, "QCAPro—an error-power estimation tool for QCA circuit design," Proceedings of the International Symposium of Circuits and Systems, pp. 2377-2380, .

[60] P. K. Meher, S. Y. Park, B. K. Mohanty, K. S. Lim, C. Yeo, "Efficient Integer Dct Architectures For Hevc," IEEE Transactions On Circuits And Systems For Video Technology, vol. 24 no. 1,DOI: 10.1109/tcsvt.2013.2276862, 2014.

[61] C.-Y. Li, Y.-H. Chen, T.-Y. Chang, J.-N. Chen, "A probabilistic estimation bias circuit for fixed-width Booth multiplier and its DCT applications," IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 58 no. 4, pp. 215-219, DOI: 10.1109/tcsii.2011.2111610, 2011.

Word count: 5012

Show less

Copyright © 2019 Ismail Gassoumi et al. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. http://creativecommons.org/licenses/by/4.0/

Abstract

Translate

Optimization for power is one of the most important design objectives in modern digital image processing applications. The DCT is considered to be one of the most essential techniques in image and video compression systems, and consequently a number of extensive works had been carried out by researchers on the power optimization. On the other hand, quantum-dot cellular automata (QCA) can present a novel opportunity for the design of highly parallel architectures and algorithms for improving the performance of image and video processing systems. Furthermore, it has considerable advantages in comparison with CMOS technology, such as extremely low power dissipation, high operating frequency, and a small size. Therefore, in this study, the authors propose a multiplier-less DCT architecture in QCA technology. The proposed design provides high circuit performance, very low power consumption, and very low dimension outperform to the existing conventional structures. The QCADesigner tool has been utilized for QCA circuit design and functional verification of all designs in this work. QCAPro, a very widespread power estimator tool, is applied to estimate the power dissipation of the proposed circuit. The suggested design has 53% improvement in terms of power over the conventional solution. The outcome of this work can clearly open up a new window of opportunity for low power image processing systems.

Details

Title

An Efficient Design of DCT Approximation Based on Quantum Dot Cellular Automata (QCA) Technology

Author

Ismail Gassoumi¹

; Touil, Lamjed²; Bouraoui Ouni³

; Mtibaa, Abdellatif¹

¹ Laboratory of Electronics and Microelectronics, University of Monastir, Monastir, Tunisia
² Laboratory of Electronics and Microelectronics, University of Monastir, Monastir, Tunisia; Higher Institute of Technological Studies of Sousse, Monastir, Tunisia
³ Networked Objects Control & Communication Systems Lab, University of Sousse, Sousse, Tunisia

Editor

Amir Sabbagh Molahosseini

Publication year

2019

Publication date

2019

Publisher

John Wiley & Sons, Inc.

ISSN

20900147

e-ISSN

20900155

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1155/2019/9029526

ProQuest document ID

2305726337

An Efficient Design of DCT Approximation Based on Quantum Dot Cellular Automata (QCA) Technology

Jump to:

Full text

Abstract

Details

Suggested sources