Content area
The expeditious development of the technologies result in immeasurable growth in Integrated Circuit chips. The on-chip communication description plays a major role in connection and management of the functional blocks of the system on chip. The Advanced eXtensible Interface (AXI) and AXI Coherence Extension are the two protocols introduced in the later version of AMBA. AXI4 is the high speed bus which has five channels for write and read operation with handshaking mechanism for control transmission. ACE have three channels in addition to the existing channels in AXI4 for cache coherence. This work focuses on the design and analysis of AXI4 and ACE protocols using Verilog language and test bench environment with help of the system Verilog environment. The functional verification of AXI and ACE interconnects, simulation waveforms developed using Cadence Xcelium EDA (Electronic Design Automation) tool and explored as per expectation without any change in the features of DUT (Design Under Test).The design is verified for both 16 bytes and 256 bytes per transfer providing 4 transfers and 4 bytes per transfer for transferring 16 bytes in a single transaction and providing 16 transfers and 16 bytes per transfer for transferring 256 bytes in a single transaction. Thus, the work shows that ACE protocol supports the snooping concept additionally to overcome cache coherence problem along with the features of AXI protocol. The environment completely encloses the DUT while monitoring those protocol's performance. The key benefit of creating a system Verilog testbench is that verification engineers will spend less time in verifying the design since the testbench is reusable. The features of this protocol help in improving the bandwidth and latency of data transfers and transactions during the communications between the peripherals.
Introduction
In contemporary electronics, integrated circuits (ICs) have become indispensable in semiconductor design, with their complexity often surpassing manual verification capabilities, thereby increasing the risk of human error [1]. Consequently, a standardized computer-based verification methodology has emerged as a cornerstone in semiconductor design. Effective functional verification methodologies are crucial for the success of semiconductor products.
Verification of any design necessitates the connection of a test bench to the Device Under Test (DUT) for delivering input stimuli to the DUT and monitoring its output signals. However, in traditional verification methods, this connection lacks encapsulation and results in maintenance complexity, especially with numerous modules in the design that need instantiation. Any changes to the design, such as adding or removing signals, exacerbate the complexity of updating instances of the DUT. For designs involving standard protocols where the DUT acts as a slave and the test bench as a master, bidirectional signal driving is required, necessitating separate master and slave interface designs, thus increasing latency [2].
Fortunately, the standard System Verilog verification environment introduces the concept of sequencing, such as mailboxes, for storing signal values and retrieving them as needed [3]. Moreover, in standard verification methodologies like System Verilog [4], although all signals are generated at once, it's possible to selectively drive only the necessary signals to the DUT, thereby reducing complexity.
To address interfacing challenges, the AMBA architecture offers various protocols for on-chip peripherals [5], with AXI being a notable bus architecture designed for both high and low bandwidth peripherals. However, AXI lacks support for snooping, which is crucial for cache coherence. Consequently, the ACE protocol was developed, incorporating snooping signals to address cache coherency issues.
This paper aims to design AMBA AXI and ACE protocols using Verilog, compare their features, and verify them by constructing a standard verification environment using System Verilog.
Literature Review
Dukare et al. [6] undertook the implementation of an ACE protocol design to handle different transactions. Their endeavor focused on developing a verification environment utilizing the Synopsys VCS EDA tool to generate the necessary stimuli for assessing the slave design's functionality. Various test cases were employed to evaluate the design's performance. However, a notable challenge emerged: the attainment of only 76% code coverage. This indicates that the protocol design effectively operates in just 76% of the tested scenarios. Consequently, enhancements are imperative to boost code coverage, potentially by augmenting the frequency of stimulus generation directed at the DUT.
Dwivedi et al. [7] introduced the design and verification of the AMBA-based Advanced Peripheral Bus (APB) protocol using a System Verilog test bench environment. The verification approach comprehensively encompasses all internal transactions of the APB protocol, rendering it applicable to microcontroller systems with varying speeds of peripherals. The simulation of the Verification Environment was conducted using Questa and Precision Pro Mentor graphical tools, with Verilog HDL utilized for the design.
Karthik et al. [8] detailed an assertion-based System Verilog verification environment tailored for the AMBA AXI bus protocol featuring a master–slave architecture. Their approach involved crafting assertions in tandem with all conditions present in the RTL design. This strategic integration of assertion statements facilitated the detection of bugs in specific lines of code without impeding the execution process.
Prasad et al. [9] designed and verified the AMBA bus architecture with high functional coverage for both AXI and APB bus. Using the Synopsys VCMX and VERDI simulator, the verification of several read and write transaction modes, including fixed, wrapping, and incremental modes of AXI3 and APB bridge, as well as functional verification, is carried out in this study.
Giridhar and Choudhury [10] devised and validated an AXI protocol tailored for a single master overseeing four slaves, employing a constraint-random-based verification methodology. Their strategy capitalized on Finite State Machines (FSM) to streamline communication between the master and slave units, adhering closely to AXI protocol guidelines. Rigorous simulation and verification procedures substantiated the effective realization of the intended communication protocols within the AXI framework.
Deepu and Dhanabal [11] conducted an investigation into the design and verification of crucial features within the AXI3 protocol. They successfully constructed verification IP components tailored for the AXI3.0 protocol. Various test cases were employed to validate key features of AXI protocol channels, including single write and read cycles from the same address location, as well as multiple cycles involving successive address locations. Through these rigorous tests, they confirmed the integrity and functionality of the AXI protocol channels' essential attributes.
Bedre and Kumar [12] focused on devising a faster chip communication architecture by implementing various arbitration schemes. They introduced a modified dynamic bus arbitrator tailored for a system-on-chip (SoC) design architecture, incorporating fuzzy logic principles. Their approach involved categorizing a subset of various masters based on their priority levels. They employed a dynamic lottery scheme at the initial stage and then applied fuzzy logic at the subsequent stage within the system, enabling access to the data bus through a novel two-stage method. This innovative approach aimed to enhance chip communication efficiency and throughput in SoC designs.
Kaur and Sulochana [13] delved into various snooping-based protocols, including those with 3, 4, and 5 states, which are designed and implemented in Verilog as cache coherence protocols. Specifically, they explored the MSI, MESI, and MOESI protocols. Each of these protocols was meticulously crafted and verified using an assertion-based System Verilog test bench environment.
The ACE protocol, as its name suggests, is primarily designed to support snooping-based concepts aimed at addressing the cache coherence problem. Drawing inspiration from the MOESI protocol, the ACE protocol extends its capabilities by incorporating three additional channels, bringing the total to eight channels. Among these, three channels are dedicated to supporting snooping-based concepts, while the remaining five channels function similarly to those in the AXI protocol.
While literature on the ACE protocol is scarce, this scarcity has prompted the development of a novel ACE Protocol implementation. This implementation features eight channels for addressing, write, read, and snooping purposes, alongside a three-state cache coherence model serving as the slave, all implemented using Verilog.
Additionally, a System Verilog Environment is created to serve as the master, responsible for driving stimuli to the slave and monitoring signals from it. This environment incorporates test bench components essential for verifying the design's functionality and adherence to protocol specifications. Through this approach, the ACE protocol's effectiveness in addressing cache coherence challenges can be thoroughly evaluated and validated.
Features of AXI And ACE Protocols
For high-performance and high-speed microcontroller systems, the AXI protocol stands out as a point-to-point interface. Unlike shared bus architectures, AXI prevents bus sharing, thereby enabling larger bandwidth and reduced latency. It's no surprise that AXI is the most widely utilized interface connection within the AMBA framework [14].
The primary function of the AXI protocol is to provide a communication framework among distinct blocks within each chip. Utilizing five channels, AXI ensures that handshake responses are sent uninterruptedly, allowing them to be received and sequenced correctly. Each channel has its own unique signal, facilitating orderly information transfer through different phases. This sequential approach ensures that information is transmitted from the source to the recipient after a handshake process, thus facilitating the transfer of data across multiple sources.
Notably, the AXI protocol supports burst-based transactions. AXI4 enables write and read transfers with variable latencies. Burst transactions can encompass up to 256 transfers, with each transfer ranging from 1 to 128 bytes. Consequently, the AXI protocol is capable of efficiently transferring up to 32 kilobytes of data in a single transaction [15]. This capability underscores the protocol's suitability for handling large volumes of data in high-performance microcontroller systems.
The ACE protocol addresses the cache coherence problem by integrating snooping-based concepts into the existing AXI protocol [6]. As a result, the ACE bus architecture maintains backward compatibility with the AXI protocol while offering additional channels dedicated to supporting snooping-based functionalities. Cache coherence has long been a significant challenge in hardware systems, and snooping-based concepts have emerged as a solution to this problem.
Various protocols have been developed to tackle cache coherence issues, with ACE being one of the AMBA bus architectures designed specifically to support snooping-based concepts for cache coherence management. Figure 1 illustrates the five-state cache coherence model of the ACE protocol, which incorporates additional channels to enhance cache coherence functionality beyond what the AXI protocol offers. These additional channels enable more efficient management of cache coherence, contributing to improved system performance and reliability.
[See PDF for image]
Fig. 1
Five state cache coherence model of ACE protocol
Proposed Work
The proposed work was carried out using Verilog and System Verilog concepts for both the master and slave designs, featuring a configuration with a single master and a single slave. The project is divided into two main phases: the Design phase and the Verification phase.
In the Design phase, the focus is on developing the slave component, which represents the target device or module within the system. This slave component is responsible for receiving commands or data from the master and executing the required operations.
On the other hand, the Verification phase involves creating a test bench environment, which serves as the master in the system. The test bench generates stimuli and sends commands to the slave design, simulating real-world interactions and scenarios. It also monitors the responses from the slave and verifies whether the behavior of the slave matches the expected outcomes.
By splitting the proposed work into these two distinct phases, the development and testing processes are streamlined, allowing for more efficient debugging and validation of the design. Additionally, this approach facilitates a systematic and structured methodology for ensuring the correctness and functionality of the overall system.
Design Phase
In the design phase, the slave designs for both the AXI4 and ACE protocols were developed separately using Verilog. This involved creating a Finite State Machine (FSM) for each channel and implementing a cache coherence state model specifically for the ACE protocol.
For the ACE slave design, only three states of the five-state cache coherence model were implemented. The memory size in the design was set to 256 bytes to accommodate the requirements of the system.
Figure 2 illustrates the channels present in both the AXI and ACE protocol designs, showcasing the distinct features and additional channels introduced by the ACE protocol to support snooping-based concepts for cache coherence management. This visual representation aids in understanding the communication framework and data flow within the designed protocols.
[See PDF for image]
Fig. 2
Channels of AXI and ACE protocols
Verification Phase
In the verification phase, the slave designs for both the AXI4 and ACE protocols were implemented using System Verilog. This choice was made due to System Verilog's status as a standard verification methodology widely utilized in the industry.
Various test bench components were created as part of the verification environment to ensure thorough testing and validation of the slave designs. These components included:
Transaction
Representing individual transactions or operations performed by the master on the slave device.
Interface
Defining the communication interface between the master and slave, including the signals and protocols used.
Configuration
Setting up parameters and configurations for the verification environment, such as clock frequency and data width.
Generator
Generating stimuli and test scenarios to exercise the slave design under different conditions and scenarios.
BFM (Bus Functional Model) or Driver
Driving stimuli and commands to the slave design, simulating the behavior of the master.
Environment
Providing the infrastructure and utilities needed to manage and coordinate the verification process, including transaction tracking and error reporting.
Test Bench
Serving as the main test environment where the verification tests are executed and results are analyzed.
By incorporating these components into the verification environment, comprehensive testing can be performed to ensure the correctness and functionality of the slave designs for both the AXI4 and ACE protocols.
Figure 3 depicts the master and slave interface environment, which comprises various test bench components such as transaction, driver, generator, etc. Each component serves a distinct purpose, collectively facilitating the generation and transmission of stimuli from the master to the slave.
[See PDF for image]
Fig. 3
System Verilog environment of AXI and ACE protocols
The AXI4 protocol adheres to the AMBA version 4 specification standard, making it compatible with burst-based communication modes. In the proposed methodology, the AXI4 protocol is designed, and a verification environment for the AXI4 protocol is developed. This environment enables comprehensive testing and validation of the AXI4 slave design, ensuring its compliance with protocol specifications and functionality requirements.
Figure 4 shows FSM of write channels and Fig. 5 shows read channels FSM, which includes all handshaking and response mechanisms of the protocol.
[See PDF for image]
Fig. 4
FSM of AXI4 write channels
[See PDF for image]
Fig. 5
FSM of AXI4 read channels
Figure 6 shows the 3-states of 5-state cache coherence model and the transition of each state depending on the assertion and desertion of handshaking signals. In the 5-state cache coherence model of ACE protocol, unique_clean and unique_dirty states are used when there is only one cache memory.
[See PDF for image]
Fig. 6
A 3-state FSM design for ACE protocol
Results and Discussion
Fixed Burst Transfer
In fixed burst type transfers, the address remains constant throughout the process. This method finds application in scenarios where consistent memory location referencing is required, such as in FIFOs and queues.
Figure 7 demonstrates how data is organized within the queue for both the AXI and ACE protocols. In this depiction, the address consistently references a fixed memory location. Consequently, when new data is added to the queue, it moves to the next memory slot, replacing the previous data at that specific location. This cyclical process allows for the retrieval of data in a similar manner, ensuring that data can be accessed and managed efficiently within the system.
[See PDF for image]
Fig. 7
Data storage in queue in fixed burst type
Incremental Burst Transfer
In incremental burst type transfers, the address for each transaction is derived by incrementing the address of the previous transfer. This approach is commonly employed to access memory in a sequential manner.
Figure 8 illustrates the process of writing data into the slave memory location using incremental burst mode transactions. In this mode, the address for each transfer is generated by incrementing the address of the preceding transfer.
[See PDF for image]
Fig. 8
Data storage in Incremental burst transfer in AXI protocol
Figure 9 showcases the data storage in the cache memory during an incremental burst type transfer, which is applicable to both the ACE and AXI protocols. In this scenario, each transfer's address is incremented from the previous address, indicating sequential data storage.
[See PDF for image]
Fig. 9
Data storage in cache memory in incremental burst type of ACE protocol
For the ACE protocol, the start address for the cache memory is determined by the ACADDR signals, which utilize address translation memory mapping techniques. Once all memory locations are filled, subsequent data transfers follow a round-robin fashion to select the next address. When the ACSNOOP signal is not asserted in the ACE protocol, it indicates that the required data is not present in the cache memory due to repeated overwrites. As a result, the read data is fetched directly from the main memory of the slave.
In contrast, in the AXI protocol, which lacks cache coherence support, data is retrieved directly from the main memory without first checking the cache memory. This difference highlights the cache coherence advantage offered by the ACE protocol over AXI, resulting in more efficient data retrieval and reduced latency.
Wrapping Burst Transfer
Wrapping burst type transfers resemble incremental burst type transfers with one key distinction: when the address boundary is reached, the address wraps back to the starting address. The wrapping boundary is determined by the size of the transfer.
Figure 10 depicts the memory storage arrangement in wrapping burst type transfers. Here, the address for each data transfer increments by one from the previous address. During read operations, data from this burst type can be retrieved by specifying the same increment range.
[See PDF for image]
Fig. 10
Data storage in Wrapping burst transfer of AXI protocol
Figure 11 illustrates the data storage configuration in cache memory utilizing wrapping burst type transfers. In this scenario, the address for each transfer is incremented by one from the previous address. The data stored in main memory of ACE protocol is like that of AXI. The start address for the cache memory is provided by the ACADDR signals which uses the address translation memory mapping technique which used the lower order 5-bits of main memory address as snooping address, that the address for cache memory. Once the data has been stored in all locations then the address for the next data transfer is taken in a round robin fashion. As the ACSNOOP signal is asserted, it indicates that required data to be read is present in the cache memory, because even if the cache memory is overwritten several times the main memory is also overwritten by similar number of times as here the address wraps around to a lower address once the wrap boundary is reached. So the read data is received from the cache memory. But in AXI as there is no cache coherence support, without looking for data in cache memory, it directly reads the data from main memory.
[See PDF for image]
Fig. 11
Data storage in cache memory in wrapping burst type of ACE protocol
Conclusion
In this work, the interconnect environment was simulated, and the output responses for various types of burst mode transactions of the AXI4 protocol were observed. Verification was conducted for all five channels of each burst mode transaction to ensure protocol compliance and functionality. With the increasing number of peripherals on-chip, cache coherence problems arise. To address this, snooping-based concepts are needed. The ACE protocol, one of the bus architectures of AMBA, supports such concepts. Therefore, both AXI and ACE protocols were designed and verified using the Cadence Xcelium EDA tool. The ACE protocol, comprising eight channels, allocates three of these channels for snooping purposes. Cache coherence concepts were also verified during the validation process. When the snoop signal is asserted, data is read directly from the cache memory, assuming that the cache has not been modified relative to the main memory. Conversely, when the snoop signal is not asserted, data is read from the main memory, assuming that the cache has been modified relative to the main memory. By conducting thorough verification using simulation and observation of output responses, the correctness and effectiveness of both the AXI and ACE protocols in managing cache coherence and facilitating efficient data transfer were ensured.
To accommodate the single cache memory in the design, a 3-state cache coherence model was implemented. The protocol supports various types of bursts, including fixed burst, incremental burst, and wrapping burst, each capable of transferring the maximum number of bytes supported by the ACE protocol. During verification, the design was tested for both 4 transfers and 4 bytes per transfer, as well as 16 transfers and 16 bytes per transfer, facilitating transaction sizes ranging from 16 to 256 bytes in a single transaction.
Furthermore, cache coherence concepts were rigorously verified. When the snoop signal is asserted, data is read directly from the cache memory, assuming that the cache has not been modified with respect to main memory. Conversely, if the snoop signal is not asserted, data is read from the main memory, assuming that the cache has been modified with respect to main memory. By conducting thorough verification, the correctness and effectiveness of the ACE protocol in managing cache coherence and facilitating efficient data transfer were ensured, contributing to the overall performance and reliability of the on-chip communication system.
Both the AXI and OCP designs use the three-state model designed using FSM, and the outputs are observed for fixed burst, incremental burst, and wrapping burst in the case of AXI and incremental burst, wrapping burst, exclusive OR burst, streaming burst, and 2-dimensional block burst in the case of the OC protocol. The design and the test bench are simulated, and the simulation waveforms are analysed using the Cadence Xcelium EDA tool. Both protocols are implemented on the Basys 3 FPGA board, and certain parameters like area and time are analysed. It is found that the OC protocol has less metastability with a setup-hold window in the range of 0–0.3 ns, whereas AXI has a setup-hold window in the range of 4–4.5 ns. As OCP supports five types of burst transfers and AXI supports three types of burst transfers, which take fewer internal signals, the device utilisation of OCP is increased by 44.4% compared with AXI.
Author Contributions
The authors have an equal contribution.
Funding
The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.
Data Availability
The data generated and/or analyzed during the current study are available from the corresponding author on reasonable request.
Declarations
Conflict of Interest
The authors have no relevant financial or non-financial interests to disclose.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
1. Furhad, H; Haque, MA; Kim, C-H; Kim, J-M. An analysis of reducing communication delay in network-on-chip interconnect architecture. Wireless Personal Communications; 2013; 73, pp. 1403-1419. [DOI: https://dx.doi.org/10.1007/s11277-013-1257-y]
2. Dharane, P., &Shiurkar, U. D. (2022). Throughput as Well as Latency Improvement Method of Processor or Chip Accelerator. Wireless Personal Communications, 1–16.
3. Sutherland, S. (2004). Modeling FIFO Communication Channels Using SystemVerilog Interfaces. SUNG Boston.
4. Keaveney, M., McMahon, A., O'Keeffe, N., Keane, K., & O'Reilly, J. (2008). The development of advanced verification environments using system verilog.
5. Patil, RP; Sangamkar, PV. A review of system-on-chip bus protocols. International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering; 2015; 4,
6. Dukare, P., Gokhale, A., & Ingale, V. (2022). Development of AMBA ACE protocol. In 7th International Conference on Computing in Engineering & Technology (ICCET 2022) (Vol. 2022, pp. 222-225). IET.
7. Dwivedi, P., Mishra, N., & Singh-Rajput, A. (2021). Assertion & Functional Coverage Driven Verification of AMBA Advance Peripheral Bus Protocol Using System Verilog. In International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT).
8. Karthik, N; Babu, MG; Rela, MMP. Assertion Based Verification of AMBA-AHB Using System Verilog. International Journal & Magazine of Engineering, Technology, Management and Research; 2015; 2,
9. Prasad, G; Paradhasaradhi, D; Reddy, GMS; Rao, K; Prabhakar, V. Design and verification of AXI APB bridge using system verilog. JARDCS; 2018; 10,
10. Giridhar, P., & Choudhury, P. (2019). Design and Verification of AMBA AHB. In 1st International Conference on Advanced Technologies in Intelligent Control, Environment, Computing & Communication Engineering (ICATIECE).
11. Deepu, M. P., & Dhanabal, R. (2017). Validation of transactions in AXI protocol using system Verilog. In 2017 International conference on Microelectronic Devices, Circuits and Systems (ICMDCS).
12. Bedre, A. L., & Kumar, V. N. (2017). A Hybrid arbiter to accelerate performance of high speed soc. In International conference on Microelectronic Devices, Circuits and Systems (ICMDCS).
13. Kaur, D. P., & Sulochana, V. (2018). Design and implementation of cache coherence protocol for high-speed multiprocessor system. In 2nd IEEE International Conference on Power Electronics, Intelligent Control and Energy Systems (ICPEICES).
14. Sharma, S., & Sakthivel, S. (2018). Design and verification of AMBA AXI3 protocol. In VLSI Design: Circuits, Systems and Applications: Select Proceedings of ICNETS2, Volume V.
15. Pradeep, S; Laxmi, C. Design and verification environment for AMBA AXI protocol for SoC integration. International Journal of Research in Engineering and Technology; 2014; 3, pp. 338-343. [DOI: https://dx.doi.org/10.15623/ijret.2014.0315066]
16. Das, S., Mohanty, R., Dasgupta, P., & Chakrabarti, P. P. (2006). Synthesis of system verilog assertions. Paper presented at the Proceedings of the Design Automation & Test in Europe Conference.
17. Shrivastav, A., Tomar, G., & Singh, A. K. (2011). Performance comparison of AMBA bus-based system-on-chip communication protocol. In International Conference on Communication Systems and Network Technologies.
18. AMBA, A., & AXI, A. P. S. AXI4, and AXI4-Lite ACE and ACE-Lite. ARM IHI D, 22.
19. Gavaskar, K., Sivaranjani, P., Elango, S., & Nirmal Raja, G. (2022). Low-Power SRAM Cell and Array Structure in Aerospace Applications: Single-Event Upset Impact Analysis. Wireless personal communications, 1–19
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.