Content area
The Embedded Block Coding with Optimal Truncation (EBCOT) Tier-1 process is a critical component of the Joint Photographic Experts Group 2000 (JPEG2000) framework, significantly influencing throughput in hardware encoders. This paper presents an optimized hardware architecture for a high-performance EBCOT Tier-1 encoder, based on parallel code-block processing. The design features simultaneous bit-plane coding via three parallel channels, along with concurrent arithmetic coding enabled by six context-generating units. To overcome the throughput bottleneck of conventional designs, we introduce a novel multiplexing strategy for the Multiple Quantization coder. Our encoder achieves a throughput of 2782 Mb/s, a 7.3 times improvement over existing implementations, making it well-suited for high-speed JPEG2000 applications. The proposed architecture, when applied to the JPEG2000 encoding system, significantly enhances the circuit performance of the overall encoder on both Field Programmable Gate Array and Application Specific Integrated Circuit platforms. This provides a novel approach to architecture optimization. When processing 512
Introduction
Images often need to be compressed first to save bandwidth during transmission due to their high resolution, Joint Photographic Experts Group 2000 (JPEG2000) standard represents a significant advancement in still image compression, widely adopted in aerospace, military, and medical applications due to its superior compression efficiency [1]. In recent years, increasing attention has been directed towards its hardware implementation, emphasizing the importance of optimizing the standard for practical use. Central to JPEG2000 is the Embedded Block Coding with Optimal Truncation (EBCOT) algorithm [2], which is divided into two tiers: Tier-1, responsible for generating compressed streams through a complex coding process, and Tier-2, which organizes these streams for optimal compression. The Tier-1 process, accounting for over 70% of the total encoding time, is the primary throughput bottleneck in JPEG2000 hardware encoders [3]. Thus, optimizing the hardware architecture of the EBCOT Tier-1 encoder is critical for improving the speed and efficiency of JPEG2000 systems, ensuring its relevance in fields requiring high-quality image compression.
EBCOT Tier-1 encoding involves numerous serial computational tasks, such as Bit Plane Coding (BPC), which requires context modeling for each bit plane within the code block, and Multiple Quantization (MQ) arithmetic coding, which sequentially processes the generated contexts. Current optimization efforts for EBCOT Tier-1 hardware design focus on three key areas: BPC encoding, MQ encoding, and overall architecture. For BPC, a pass-parallel architecture has been proposed [4], enabling simultaneous encoding of all bits in a single scan. MQ encoder optimizations have included increasing operating frequency by simplifying interval updates and renormalization logic [5], and improving the standard Probability Estimation Table (PET) by merging operations and simplifying index updates [6]. Parallel processing techniques have also been explored, such as using eight units to handle dual contexts [7], leveraging independent adjacent contexts for parallel processing [8], and concurrently encoding contexts with shared probability estimates [9, 10]. Regarding the overall architecture, a bit-plane parallel encoder structure has been proposed [11], significantly improving throughput. While previous research has led to advancements in the hardware design of the BPC and MQ modules individually, most studies have overlooked the critical aspect of throughput matching between these modules, as well as their integrated allocation within the overall EBCOT Tier-1 architecture. This has resulted in high individual module throughput but relatively low overall computational efficiency and throughput for the EBCOT Tier-1 process. Our research fills this gap and delivers promising circuit performance outcomes.
The structure of this paper is as follows: Sect. 1 provides a brief introduction to JPEG2000 and the necessity of optimizing EBCOT. Section 2 offers a detailed analysis of EBCOT, focusing on the Tier-1 coding process, including its BPC and MQ encoding algorithms. Section 3 presents the proposed optimized EBCOT Tier-1 architecture and its underlying principles. Section 4 provides a performance analysis of the designed hardware and compares it with existing architectures. Finally, Sect. 5 summarizes the key findings and contributions.
EBCOT algorithm
Unlike its predecessor, JPEG, the JPEG2000 standard employs the Discrete Wavelet Transform (DWT) instead of the Discrete Cosine Transform (DCT), allowing images to be decomposed into multiple sub-band components at varying resolutions. The proposed EBCOT Tier-1 encoder hardware architecture is deployed in the high-speed JPEG2000 hardware encoder capable of dividing images into several Tile slices of identical size and performing independent coding and decoding operations on each Tile slice. The JPEG2000 hardware encoder employs 128128 size Tile slices, a 5-stage 9/7 wavelet transform, and encodes each sub-band of every wavelet transform as a separate code block. These blocks are distributed across different sub-bands, with their size constrained by the sub-band dimensions. The EBCOT coder is divided into the Tier-1 coder and the Tier-2 coder. During the Tier-1 coding phase, the bit-plane coder creates context models for each bit of the wavelet coefficients, generating sequences of Context (CX) and Decision (D) pairs. The MQ coder then performs arithmetic encoding on these CXD pairs, producing compressed streams across multiple encoding passes. Finally, the Tier-2 coder organizes these streams based on the target code rate.
Within each code block, BPC processes quantized binary wavelet coefficients into a series of bit planes, scanning each bit through three distinct passes, named Significance Propagation Pass (SP), Magnitude Refinement Pass (MRP) and Cleanup Pass (CP). Following pass assignment, CXD pairs are generated through context modeling, using coding primitives like Magnitude Refinement Coding (MRC), Sign Coding (SC), and Run-Length Coding (RLC). The MQ encoder, a binary adaptive arithmetic encoder, derives its name from its origins in Q coding. It retrieves the probability estimation value, Qe, from the PET based on the index and determines the encoding method using the More Probable Symbol (MPS) value and decision value, D. Subsequently, the encoder updates the probability intervals to generate the compressed code stream. Upon receiving a CXD pair, the MQ coding interval is divided into two subintervals: the upper interval corresponds to MPS coding, while the lower interval is for Less Probable Symbol (LPS) coding.
According to the rules of BPC coding, each bit plane must sequentially undergo scanning through the SP, MRP, and CP coding passes. However, performing three separate scans for each bit plane consumes significant encoding time and results in low efficiency. Since SP, MRP, and CP have distinct functions, operate under different conditions, and are not directly dependent on each other, parallel processing can be leveraged to improve efficiency. Reference [4] proposes a parallel scanning approach for these three coding passes. The core idea of this method is to determine the importance state of encoded bits after passing through a specific coding pass, enabling all encoded bits within a bit plane to be assigned to coding passes in a single scan. By adopting this parallel scanning-based BPC encoder, up to 10 CXD pairs can be generated per cycle. In contrast, the MQ encoder can process only 1–2 CXD pairs per cycle, which is far lower than the output of the BPC encoder. If a standalone MQ structure is still used, this mismatch in throughput would lead to inefficiency and a reduction in overall performance.
Proposed EBCOT Tier-1 architecture
Code block parallel coding strategy
Currently, most researchers adopt architectures where the 2-Dimensional (2-D) DWT coefficients from all decomposition levels are divided into uniformly sized code blocks for parallel processing by multiple EBCOT modules. For example, [18] employs code blocks of size 3232 for parallel processing. Similarly, [16] proposes a bit-parallel EBCOT processing architecture, while [19] adopts a channel-parallel EBCOT processing approach. In contrast, the architecture proposed in this paper not only converts wavelet coefficients from different decomposition levels into uniformly sized code blocks but also ensures that corresponding code blocks from the same decomposition level are processed in parallel within the same time frame. This approach maximizes the data integrity within each code block and the correlation between adjacent data in the same bit-plane, leading to superior performance. Furthermore, it significantly enhances the efficiency of the three parallel EBCOT modules in our architecture.
Figure 1 illustrates how EBCOT coding divides code blocks, and code block sizes correspond to wavelet coefficients of resolution levels 1 to 5: 64 64, 32 32, 16 16, 8 8, and 4 4, respectively. Notably, wavelet coefficients of level 5 contain only one code block, specifically the LL subband code block. Conversely, the lower four levels of wavelet coefficients have three code blocks each, i.e., LH, HL, and HH subband code blocks, where LHi denotes the LH component subband of the i-th resolution level.
Fig. 1 [Images not available. See PDF.]
EBCOT Tier-1 encoder code block division
An essential feature of EBCOT coding is that each code block gets encoded independently of the others. Based on this characteristic, the proposed EBCOT Tier-1 encoder uses a code-block parallel EBCOT Tier-1 coding strategy that employs three code-block encoders to encode various code blocks in a Tile slice simultaneously, thereby enhancing encoder parallelism. Table 1 illustrates the parallel coding strategy where blk_coder0, blk_coder1, and blk_coder2 depict three code-block encoders with identical internal hardware architecture. The table displays code blocks encoded sequentially by each code-block encoder at different time intervals for one Tile slice. To ensure that every EBCOT Tier-1 module undergoes an approximately equal amount of computation, each block encoder encodes all blocks within the same sub-band component, starting from high-resolution level blocks and proceeding to low-resolution level blocks. The first code-block encoder initially encodes wavelet coefficients of the highest LL sub-band and then proceeds to encode the code blocks of the HL sub-band. Each individual BPC module simultaneously scans a column of four wavelet coefficients within a single clock cycle. Unlike other studies in this field, the proposed EBCOT Tier-1 parallel architecture in our work goes beyond mere BPC parallelism. It also implements parallel processing in MQ by selecting different MQ coding engines for different channels. This aspect will be elaborated on in the subsequent sections of this paper.
Table 1. Subbands processed by different coders in each coding period
Coding period | blk_coder0 | blk_coder1 | blk_coder2 |
|---|---|---|---|
T1 | LL5 | – | – |
T2 | HL5 | LH5 | HH5 |
T3 | HL4 | LH4 | HH4 |
T4 | HL3 | LH3 | HH3 |
T5 | HL2 | LH2 | HH2 |
T6 | HL1 | LH1 | HH1 |
Pass-parallel MQ encoding engine
Fig. 2 [Images not available. See PDF.]
Pass parallel MQ coding engine
This paper adopts the three-pass parallel scanning BPC encoding method proposed in [4], which improves encoding efficiency and reduces processing time by assigning a dedicated coding pass to each bit in the bit plane. To realize the parallel scanning of the three encoding channels, this design uses the “vertical causality” mode in JPEG2000 standard, which defaults the importance of the first line of encoded bits in the next band to be unimportant in the previous band, thus avoiding the prediction of long importance propagation relationships, and only has a small impact on the quality of the compressed image. However, in EBCOT Tier-1 coding, there is a significant mismatch between the throughput rates of the BPC and MQ encoders due to their different modes of operation. Specifically, in the CP pass, when the travel coding primitive is applied and the amplitude of all four coded bits is 1, travel coding generates 3 CXD pairs, the first bit (with amplitude 1) produces 1 CXD pair through symbolic coding, and the next 3 bits generate 6 CXD pairs through zero coding and symbolic coding. In contrast, the MQ encoder can process only 1–2 CXD pairs per cycle, which is far below the BPC encoder’s rate. To address this, the independent MQ encoders are integrated into the pass-parallel MQ coding engine, as shown in Fig. 2, enabling parallel encoding of CXD pairs from three coding passes, significantly reducing MQ encoding time.
The pass-parallel MQ coding engine consists of the following components: three sets of First-In-First-Out (FIFO) to buffer CXD pairs from the three coding passes, and three CXD pair read control units to feed the buffered CXD pairs into the MQ encoders, while monitoring the FIFO status to manage the start and stop of each MQ encoder. A single-context MQ encoder processes one CXD pair per clock cycle, while a multiple-context MQ encoder scans two consecutive CXD pairs within each clock cycle to determine their encoding mode. If both CXD pairs are encoded as MPS, the encoder performs parallel encoding of the two pairs in the next cycle. Otherwise, the encoder processes only the first CXD pair in the subsequent cycle. Compared to a fully parallel encoding strategy, this MPS-MPS-based encoding approach effectively reduces computational complexity, thereby minimizing hardware resource consumption and shortening the design’s critical path. Regarding the structure of the MQ encoders, since the average number of CXD pairs processed per cycle differs across the three passes, a multi-context MQ encoder is employed for the SP and CP passes, while a single-context MQ encoder is used for the MRP pass [12]. The throughput of the applied MQ coder is 1LPSs or 2xMPSs.
BPC-MQ serial-parallel design
Fig. 3 [Images not available. See PDF.]
Block diagram of Tier-1 with BPC-MQ series–parallel design
In this architecture, a single BPC encoder performs context modeling for each bit plane sequentially, while multiple MQ encoding engines are multiplexed to encode the generated CXD pairs. The BPC and MQ encoders operate concurrently. When the BPC encoder completes the scan of a bit plane, it sends a signal to the MQ engine scheduling module, requesting an available MQ engine. The scheduling module checks the signal to determine if the corresponding MQ engine for the next bit plane is free. If available, a signal is returned, and the BPC and MQ encoding for the next bit plane begins. If not, the system waits until the MQ engine completes encoding the remaining CXD pairs before proceeding. The signal is used to assign the appropriate MQ engine to the BPC encoder (Fig. 3).
In practical applications, the number of MQ encoders in each EBCOT Tier-1 encoding module must be optimally configured. Amdahl’s Law highlights the potential speedup of a system utilizing parallel computation when the total computational load is fixed. Amdahl’s Law can be expressed by Formula (1).
1
In this formula, S represents the speedup achieved through parallel computation, denotes the fraction of total computation time that is parallelizable, and N is the number of parallel processing units, which in this design corresponds to the number of MQ encoders in each EBCOT Tier-1 module. It is evident that as N increases, the system’s speedup does not scale indefinitely.Based on the above analysis, this paper first evaluates the computation time of the EBCOT Tier-1 encoder with varying numbers of MQ encoders. Then, using logic synthesis, the hardware resource consumption for different configurations is determined to identify the optimal number of MQ encoders. The ModelSim simulation tool was used to measure the clock cycles required for EBCOT Tier-1 encoding of five standard 512 512 8-bit test images: Barbara, Crowd, Aerial, Goldhill, and Peppers, under different MQ encoder configurations. Table 2 presents the average results across the five images. Where the number of cycles consumed for a single image when is shown in Table 3. Notably, due to the parallel codeblock encoding method, each of the three EBCOT Tier-1 modules must be configured with the same number of MQ encoders, making the total number of MQ encoders a multiple of three.
Table 2. EBCOT Tier-1 encoding time and hardware resource consumption of different number of MQ engines (512 512 8-bit standard test image)
Acceleration rate | |||||
|---|---|---|---|---|---|
3 | 269,454 | – | 28,938 | 22,521 | 68 |
6 | 179,171 | 1.50 | 38,950 | 28,905 | 113 |
9 | 172,084 | 1.57 | 50,804 | 35,918 | 158 |
Table 3. Results of encoding time consumption tests for the proposed EBCOT Tier-1 when
Image | Barbara | Crowd | Aerial | Goldhill | Peppers | Average |
|---|---|---|---|---|---|---|
190,739 | 167,194 | 204,107 | 157,282 | 176,533 | 179,171 |
Subsequently, the ISE tool was used to perform logic synthesis and implementation of the EBCOT Tier-1 encoder with different numbers of MQ encoders. The target platform was the Xilinx XC7K325T-2FFG900 chip. From the results in Table 2, it can be observed that configuring 6 MQ encoders achieves a speedup of 1.50 compared to 3 encoders, showing a significant performance gain. However, with 9 MQ encoders, the speedup only increases to 1.57, offering marginal improvement at the cost of significantly higher hardware resource usage. Configuring 6 MQ encoders ensures that the BPC encoder operates without stalling in most cases, while the additional 3 MQ encoders in the 9-encoder configuration remain idle most of the time, leading to inefficient resource utilization. Therefore, in this JPEG2000 encoder design, 6 MQ encoders are selected, with each EBCOT Tier-1 module configured with 6 MQ encoders, balancing hardware resource consumption and throughput.
Architecture performance analysis
EBCOT module performance and comparison
Table 4. EBCOT Tier-1 encoder performance evaluation results
Device | Frequency | ||||
|---|---|---|---|---|---|
XC7K325T | 172.86 MHz | 38,950 | 28,905 | 17,018 | 113 |
Table 4 gives the performance evaluation results of the EBCOT Tier-1 encoder. The proposed architecture adopts a 3-parallel design for the BPC encoder, optimized for code blocks. The circuit performance and resource consumption, compared with existing architectures in the field, are presented in Table 5. As shown, the proposed architecture achieves the highest operating frequency on the Virtex2 device, although its resource consumption is higher than that of the BPC architecture in [17].
The proposed architecture employs a 6-MQ-engine parallel design for the MQ encoder. The circuit performance and resource consumption, compared with existing architectures in the field, are summarized in Table 6. As shown, the proposed architecture achieves the highest operating frequency and throughput rate among all existing designs. Although the area and power consumption are 66.3% and 35.4% higher than those of the architectures in [7] and [21], respectively, the throughput rate is at least 3.28 times higher than that of other architectures. This trade-off between circuit resources and performance demonstrates clear advantages.
Table 5. Comparison of BPC encoder performance on Xilinx Virtex2 FPGA
Parameters | Wintner [22] | Sarawadekar [23] | Gavvala [18] | Ghodhbani [17] | Proposed |
|---|---|---|---|---|---|
4420 | 2149 | 16,769 | 741 | 3258 | |
7071 | 2488 | 18,370 | 1306 | 3052 | |
1560 | 105 | 8419 | 68 | 1025 | |
Frequency | 50 MHz | 67 MHz | 91 MHz | 149 MHz | 203 MHz |
Table 6. Comparison of MQ encoder performance
Parameters | Dyer [19] | Liu [5] | Belyaev [20] | Liu [7] | Proposed |
|---|---|---|---|---|---|
Device | Stratix | Virtex5 | Virtex4 | Virtex4 | Virtex4 |
2265 | 829 | 750 | 6974 | 11,598 | |
Frequency | 36 MHz | 79 MHz | 105 MHz | 96 MHz | 173 MHz |
Throughput rate | 60 Msymbols/s | 79 Msymbols/s | 105 Msymbols/s | 96 Msymbols/s | 348 Msymbols/s |
Power | – | – | 640.82 mW | 694.56 mW | 941.04 mW |
Table 7. Optimized EBCOT Tier-1 encoder design vs. standard serial encoder performance
Parameters | Serial architecture | Yan [13] | Proposed architecture |
|---|---|---|---|
8482 | – | 38,950 | |
4480 | – | 28,905 | |
– | 18,787 | – | |
1,697,402 | 462,792 | 179,171 |
Table 7 compares the encoding time and hardware resource consumption of the optimized EBCOT Tier-1 encoder with the standard serial EBCOT Tier-1 encoder on the Xilinx XC7K325T platform. The optimized design uses 4.6 times more Look-Up-Table (LUT) and 6.5 times more registers than the standard serial encoder but reduces the number of clock cycles to 10.6% of the serial encoder’s. This optimized EBCOT Tier-1 encoder significantly reduces the computation time while maintaining high hardware efficiency, effectively overcoming the throughput bottleneck of JPEG2000. Yan et al. [13] proposed an architecture that differs from traditional serial designs. It features a dual-parallel context modeling unit capable of processing four Symbols simultaneously, along with a pipelined binary arithmetic encoder based on dual-context windows. However, the number of clock cycles required by their architecture is 2.58 times that of the architecture proposed in this paper.
Table 8. Hardware performance comparison with the latest EBCOT Tier-1 encoder design
Parameters | Bascones [3] | Wu [14] | Mert [21] | Proposed |
|---|---|---|---|---|
Device | Virtex7 | Kintex7 | Virtex2 | Kintex7 |
Frequency | 255 MHz | 124 MHz | 118 MHz | 173 MHz |
2708 | 17,943 | 4118 | 17,018 | |
Throughput rate | 380 Mb/s | 2735 Mb/s | – | 2782 Mb/s |
This result effectively overcomes the throughput bottleneck of the EBCOT Tier-1 encoder. Table 8 compares the hardware performance of the proposed EBCOT Tier-1 architecture with the latest designs. The theoretical maximum throughput of the current design reaches 2782 Mb/s (with 11-bit precision for storing quantized wavelet coefficients), which is 7.4 times higher than the design in [3]. Additionally, the proposed design consumes 17,018 slices, 6.2 times more than the design in [3]. Compared to [14], the throughput is nearly identical, but the design proposed in this paper saves approximately 1000 slices. Techniques such as code-block parallel encoding, BPC encoding pass optimization, parallel MQ coding engine design, and MQ engine reuse consume more logic resources, but significantly enhance the encoder’s parallelism and reduce EBCOT Tier-1 encoding time. Given sufficient hardware resources, the proposed architecture fully leverages Field Programmable Gate Array (FPGA) parallelism, overcoming the throughput bottleneck of JPEG2000 encoders and is suitable for high-speed JPEG2000 hardware implementation.
JPEG2000 project performance and comparison
Fig. 4 [Images not available. See PDF.]
Image compression test platform
Fig. 5 [Images not available. See PDF.]
Image compression test process
Figure 4 shows the image compression test platform, which is divided into two parts: the KC705 development board and the upper computer software, which contains the jpc file encapsulation script written in Python language and the image decoding and analyzing software with Jasper software as the core. According to the test process shown in Fig. 5, several test images are transferred to the development board through the Universal Asynchronous Receiver/Transmitter (UART) interface and then stored in the Double-Data-Rate 3 (DDR3) memory. Then, the test images are hardware encoded to generate a JPEG2000 compressed stream. The compressed stream is uploaded to the host computer through the UART interface, and then the jpc file wrapper script is used to generate the compressed image in jpc format. Finally, the jpc format images were decompressed using image decoding and analyzing software, and the quality of the recovered images was evaluated.
In this paper, To test the image quality of the proposed architecture in processing large-size images, the satellite image dataset in Maxar Open Data Program is selected for testing, the image size is 2048 2048, the specific compression effect under different compression ratios is shown in Fig. 6. From the data in Table 9, the JPEG2000 hardware encoder with the optimized EBCOT module is able to achieve basically the same compressed image quality as that of the software encoding, and the Peak Signal-to-Noise Ratio (PSNR) value of the hardware encoding decreases by less than 1 dB compared with that of the software encoding. In the evaluation of processing speed, this paper has selected different sizes of images for testing, when processing general size images (512 512 size), the processing frame rate can reach more than 160 Frames Per Second (FPS), and when processing large size images (2048 2048 size), the processing frame rate can reach more than 10 FPS.
Fig. 6 [Images not available. See PDF.]
Comparison of software and hardware compression effects of satellite images with different compression ratios, where a 3:1, b 6:1, c 12:1, and d 24:1
Table 9. JPEG2000 hardware and software image compression effect comparison
Compression ratio | PSNR of Jasper (dB) | PSNR of proposed hardware (dB) |
|---|---|---|
3:1 | 42.82 | 42.04 |
6:1 | 39.73 | 38.73 |
12:1 | 36.99 | 36.01 |
24:1 | 32.50 | 32.32 |
Table 10. Image compression hardware performance comparison on FPGA
Parameters | Guo [16] | Proposed |
|---|---|---|
Device | 6VSX475T | XC7K325T |
LUT | 214,936 | 79,473 |
Register | 98,847 | 51,691 |
Slice | – | 25,473 |
BRAM | 733 | 141 |
Frequency | 147 MHz | 120 MHz |
Throughput rate | 120 MSymbols/s | 68 MSymbols/s |
Table 11. Image compression hardware performance comparison on 0.18 m CMOS Technology ASIC
Parameters | ADV212 [15] | Modrzyk [24] | Liu [25] | Yamauchi [26] | Proposed |
|---|---|---|---|---|---|
Gate count | – | 179 K | 180 K | – | 71 K |
Core area | – | 19 mm | 20 mm | 13 mm | 14 mm |
Throughput rate | 50 Msymbols/s | 180 Msymbols/s | 66 Msymbols/s | 70 Msymbols/s | 57 Msymbols/s |
Frequency | 150 MHz | 100 MHz | 100 MHz | 100 MHz | 100 MHz |
Power | – | 2 W | 450 mW | 900 mW | 563 mW |
The proposed JPEG2000 encoder is compared with existing architectures by other researchers, with the FPGA comparison results summarized in Table 10. Regarding throughput for standard test images, the design in [16] achieves a throughput of 120 MSymbols/s, approximately 1.76 times that of the proposed design. However, the [16] design uses 2.7 times more LUT resources and significantly more memory resources compared to our design, which limits its applicability under constrained hardware resources.
To simulate resource consumption on an Application Specific Integrated Circuit (ASIC) platform, the proposed architecture was synthesized using the Taiwan Semiconductor Manufacturing Company (TSMC) 0.18 um technology, with the results presented in Table 11. As shown, the proposed architecture achieves one of the highest computational efficiencies. While its power consumption and throughput are slightly lower than those of the architecture in [25], our design reduces area consumption by 30%, owing to the efficient integration of the BPC-MQ serial-parallel architecture. Compared to the architecture in [26], our design reduces power consumption by 37%. These results highlight the superior overall performance of the proposed architecture, making it suitable for low-power design.
Conclusion
In this paper, a high-performance EBCOT Tier-1 encoder hardware architecture is proposed. The designed EBCOT Tier-1 encoder consists of three code-block encoders, which are capable of encoding code blocks of different subbands in parallel. The innovation lies in the design of a pass-parallel MQ encoding engine that utilizes two MQ encoders with different hardware architectures to achieve parallel arithmetic coding of the three coding channel generation contexts. In addition, the throughput gap between the MQ encoder and the BPC encoder is reduced by multiplexing the MQ encoding engine. The designed hardware architecture of the EBCOT Tier-1 encoder achieves a data throughput rate of 2782 Mb/s, which is suitable for the design of high-speed JPEG2000 hardware encoders. And the image compression test is carried out on high-definition satellite images, and the compression effect of the hardware platform is obtained similarly to that of the software platform, indicating that it can be well applied to the real-time compression of high-definition images.
Data availability
No datasets were generated or analyzed during the current study.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
1. Australia S, Information technology—JPEG 2000 image coding system—Part 1: Core coding system (2001)
2. Taubman, D. High performance scalable image compression with EBCOT. IEEE Trans. Image Process.; 2000; 9,
3. Bascones, D; Gonzalez, C; Mozos, D. An FPGA accelerator for real-time lossy compression of hyperspectral. Remote Sens.; 2000; 12,
4. Ghodhbani, R; Saidani, T; Horrigue, L et al. An efficient pass-parallel architecture for embedded block coder in JPEG2000. J. Real-Time Image Proc.; 2019; 16,
5. Wensong, L., En, Z., Ye, L., et al.: Design of JPEG2000 arithmetic coder using optimized renormalization procedure. In: Proceedings of the 2011 International Conference on Multimedia and Signal Processing, pp. 41–45. IEEE, Guilin (2011)
6. Bao, N., Jiang, Z., Qi, Z., et al.: High-throughput MQ encoder for pass-parallel EBCOT in JPEG2000. In: Proceedings of the 2015 28th IEEE International System-on-Chip Conference, pp. 410–414. IEEE, Beijing (2015)
7. Liu, K; Zhou, Y; Li, YS et al. A high performance MQ encoder architecture in JPEG2000. Integr. Vlsi J.; 2010; 43,
8. Cao, H., Zhang, Y., Jiang, H.: A high-throughput MQ coder architecture based on dependence extraction method. In: Proceedings of the 2014 IEEE International Conference on Image Processing, pp. 1203–1207. IEEE, Paris (2014)
9. Mei, K; Zheng, N; Huang, C et al. VLSI design of a high-speed and area-efficient JPEG2000 encoder. IEEE Trans. Circuits Syst. Video Technol.; 2007; 17,
10. Zhou, P., Bao-Jun, Z.: High-throughout hardware architecture of MQ arithmetic coder. In: Proceedings of the IEEE 10th International Conference on Signal Processing, pp. 430–433. IEEE, Beijing (2010)
11. Li, L.T., Shi, J.Y., Di, Z.X.: High parallel VLSI architecture design of BPC in JPEG2000. In: Proceedings of the 13th IEEE International Conference on ASIC. IEEE, Chongqing (2019)
12. Jing, P; Zhang, W; Yan, L; Liu, Y. VLSI design of a high-performance multicontext MQ arithmetic coder. IEEE Trans. Very Large Scale Integr. (VLSI) Syst.; 2023; 31,
13. Yan, X.L., Qin, X., Yang, Y., et al.: A high performance architecture of EBCOT encoder in JPEG 2000[C]. In: Proceedings of the IEEE International Symposium on Circuits and Systems, pp. 492–495. IEEE, Kobe (2005)
14. Wu, ZL; Zhang, W et al. A high-performance dual-context MQ encoder architecture based on extended lookup table. IEEE Trans. Very Large Scale Integr. (VLSI) Syst.; 2023; 31,
15. Wilcox, E.P., Campola, M.J., Nadendla, S., et al.: An improved SEL test of the ADV212 video codec[C]. In: Proceedings of the 2017 IEEE Radiation Effects Data Workshop, pp. 324–327. IEEE, New Orleans (2017)
16. Guo, J., Li, Y.S., Liu, K., et al.: Efficient VLSI architecture of JPEG2000 encoder[C]. In: Proceedings of the 6th International Congress on Image and Signal Processing, pp. 192–197. IEEE, Hangzhou (2013)
17. Ghodhbani, R., Saidani, T., Horrigue, L., Atri, M.: Analysis and implementation of parallel causal bit plane coding in JPEG2000 standard. In: World Congress on Computer Applications and Information Systems (WCCAIS). Hammamet, Tunisia 2014, pp. 1–6 (2014). https://doi.org/10.1109/WCCAIS.2014.6916602
18. Gavvala, R., Gopal, M.M., Chandra, S.S., Rao, S.S.: Pass-parallel VLSI architecture of BPC for embedded block coder in JPEG2000. In: 2012 Asia Pacific Conference on Postgraduate Research in Microelectronics and Electronics, Hyderabad, India, pp. 111–117 (2012). https://doi.org/10.1109/PrimeAsia.2012.6458637
19. Dyer, M., Taubman, D., Nooshabadi, S., Gupta, K.: Con currency techniques for arithmetic coding in jpeg2000. In: IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 53, no. 6, pp. 1203–1213 (2006)
20. Belyaev, E; Liu, K; Gabbouj, M; Li, Y. An efficient adaptive binary range coder and its VLSI architecture. IEEE Trans. Circuits Syst. Video Technol.; 2015; 25,
21. Mert, YM; Yilmaz, O; Erdem, H et al. Lossy coding improvement of EBCOT design for onboard JPEG2000 image compression [J]. IEEE; 2013; [DOI: https://dx.doi.org/10.1109/RAST.2013.6581296]
22. Wintner, R. Bits on the big screen. IEEE Spectr.; 2006; 43,
23. Sarawadekar, K., Banerjee, S.: A high-performance architecture of JPEG 2000 and its FPGA implementation. In: 17th European Signal Processing conference (EUSIPCO 2009), September 2009
24. Modrzyk, D., Staworko, M.: A high-performance architecture of JPEG2000 encoder. In: 19th European Signal Processing Conference. Barcelona, Spain 2011, pp. 569–573 (2011)
25. Liu, L., Chen, N., Meng, H., Zhang, L., Wang, Z., Chen, H.: A VLSI architecture of JPEG2000 encoder. In: IEEE J. of Solid-State Circuits, vol. 39, no. 11, Tsinghua Univ., China, November (2004)
26. Yamauchi, H., Okada, S., Taketa, K., Matsuda, Y., Mori, T., Watanabe, T., Matsuo, Y., Matsushita, Y.: Mater, and Devices Dev. Center, Sanyo Electr. Co. Ltd., 1440 1080 pixel, 30 frames per second motion-JPEG 2000 codec for HD-movie transmission. In: IEEE J. of Solid-State Circuits, vol. 40, no. 1, Gifu, Japan (2005)
Copyright Springer Nature B.V. Jan 2025