Content area

Abstract

In this study, the authors propose a solution for implementing a digital signature system using the post-quantum digital signature scheme Falcon on the FPGA hardware platform XC7A35T-CPG236. The system is combined with a software component developed in C# using the NET Framework 4.7, capable of running on the Windows operating system. Falcon is selected for its compact signature size and high computational performance, optimizing the signing and verification processes on hardware. The system is designed to leverage the hardware acceleration capabilities of an FPGA to enhance performance compared to software-only signature methods. The integration of the Falcon scheme into both hardware (FPGA) and software components ensures high flexibility for real-world deployment. The obtained results show that key generation execution time is approximately 1516.2 ms, while the signing and verification times are around 2–5 ms. Hardware resource usage has significantly improved, and the power consumption is 0.097 W (57%). However, the software component has not yet undergone thorough security testing. Given the achieved results in terms of execution time and energy consumption, the proposed device and software may be suitable for certain applications that do not require stringent performance and energy constraints. Furthermore, this research demonstrates the feasibility of deploying post-quantum cryptographic solutions on embedded hardware platforms, contributing to the development of secure communication systems in the future.

Full text

Turn on search term navigation

1. Introduction

According to [1], with the advent of quantum computers, many cryptographic systems face potential threats of being compromised. Notably, AES-128, SHA-256 [2], 2048-bit RSA [3], and 256-bit ECDSA [4] have already been shown to be vulnerable. In response to these risks, the cryptographic research community has focused on developing post-quantum digital signature systems and schemes that remain secure against both classical and quantum computer attacks and are also compatible with existing communication infrastructures [5]. This situation calls for a thorough evaluation of cryptographic algorithms to determine which of them can still ensure short-term security in the face of emerging quantum technologies. Among the leading candidates in the Post-Quantum Cryptography (PQC) competition initiated by NIST, Falcon—a digital signature scheme based on the NTRU lattice—has drawn significant attention due to its strong security, high performance, and optimal key/signature size [6,7,8]. Falcon is built upon the GPV (Gentry–Peikert–Vaikuntanathan) framework for lattice-based digital signatures. It employs NTRU lattices and uses fast trapdoor sampling through Fourier transforms [6,7,8]. The security of the Falcon scheme relies on the hardness of the Short Integer Solution (SIS) problem over NTRU lattices, which remains difficult even in the presence of quantum computers [7].

In the fourth round of the NIST PQC standardization process, the NTRU lattice problem [9,10] continues to be recognized as a strong candidate [5]. Publication [11] provides evaluations of the hardware design and performance of post-quantum algorithms on hardware platforms. It also suggests methods indicating the suitability of implementing NTRU lattice-based schemes on FPGA and ASIC hardware platforms. This highlights that most of the post-quantum cryptographic algorithms selected for Round 4 of the NIST competition still lack extensive research on implementation and execution on hardware platforms like FPGA, ARM, and SoC. However, publication [11] also presents results and evaluations of energy efficiency and execution speed for PQC algorithms (selected in NIST Round 2) on various FPGA platforms. While the results demonstrate effectiveness, they also reveal the complexity and high resource consumption associated with FPGA implementations [11]. Due to the complexity of implementing these post-quantum signature schemes on FPGA hardware, studies [12,13] have proposed focusing on implementing only the fundamental components of post-quantum cryptographic schemes—such as NTT, INTT, Keccak, and BaseMul—which are commonly used in the design of signature schemes and cryptographic systems based on NTRU lattices [12]. This modular approach enables flexibility, better adaptability for different applications, and improved execution speed. Publication [7] outlines the implementation results, performance, and limitations of the Falcon signature scheme on various hardware platforms. Furthermore, it proposes solutions to improve these limitations [7]. Publication [8] highlights the distinctive hardware implementation characteristics of the Falcon signature scheme compared to other lattice-based post-quantum signature schemes built on the NTRU problem. Despite Falcon’s strong theoretical and practical advantages, deploying it on embedded devices—where hardware resources are constrained—remains challenging. Based on performance studies of Falcon on FPGA hardware, this research proposes a complete digital signature system based on Falcon, implemented on the FPGA Arty-7 platform and integrated with a control software component running on a PC. The design focuses on optimizing three core components: key generation, signing, and verification. It effectively leverages the parallelism and customizability of FPGA to enhance processing speed and reduce resource consumption. Based on the hardware design analysis of post-quantum cryptographic solutions, the authors implement several fundamental components of the Falcon scheme on the FPGA chip. The post-quantum Falcon signature application then calls these basic hardware functions during key generation, signing, and verification. Finally, the authors evaluate the performance of the proposed solution. The research results are presented in detail in this paper.

2. Related Work

A complete digital signature system based on Falcon is implemented on the Arty-7 FPGA hardware platform and a control software running on the Windows operating system. The research content of the authors, presented in detail below, focuses on the following: (1) Mathematical foundations in post-quantum digital signature schemes; (2) Selection of software and hardware components for the post-quantum falcon digital signature device.

2.1. Mathematical Foundations in Post-Quantum Digital Signature Schemes

To provide a clearer understanding of the Falcon system, Figure 1 illustrates the sequential workflow processes within the implemented design. Furthermore, the theoretical design details and mathematical soundness of Falcon are discussed in [14]. Specifically, the core components of the Falcon signature scheme include the following: (1) The GPV framework, (2) The NTRU lattice, and (3) Fast Fourier Sampling.

In this scheme, the public key is a polynomial hZqx of degree n − 1, defined as h=f1.g mod q. The private key consists of four polynomials f, g, F, G (satisfying the relation fGgF=q). The signature s is a short vector in Zn (such that s AT=Hmmod q). Signature verification involves the following steps:

Step 1: Compute c0 such that c0AT=Hmmod q.

Step 2: Use the private basis B to compute a vector vΛ (the orthogonal lattice), such that v is close to c0.

Step 3: Verify the shortness of s =c0 v. If c0 and v are sufficiently close, then s is a short vector, confirming the validity of the signature.

According to [7], the authors implemented the Falcon algorithm on the Zynq UltraScale+ FPGA platform (ZCU104 [15]). This work improved upon prior research, which previously only achieved hardware implementation for the signature verification module [7]. However, [7] still presents certain limitations: slower execution compared to pure VHDL implementation, high hardware cost due to reliance on high-end UltraScale+ series FPGAs, and lack of optimization for lower-end platforms. In [8], it was noted that Falcon differs from other PQC-DSA schemes such as CRYSTALS-Dilithium and SPHINCS+ in that it does not have a single dominant computational task. Instead, Falcon relies on double-precision floating-point arithmetic, which significantly increases hardware complexity if fully implemented in hardware. To address these limitations, this study proposes a novel multi-level optimization approach, rather than optimizing based on large functional blocks. Consequently, a Falcon-based digital signature device is developed, utilizing a hardware/software co-design to enhance system security. This approach reduces hardware design complexity and resource usage, thereby lowering device cost while maintaining performance and security. Additionally, pipelined architecture is adopted to improve execution speed and reduce latency. Select modules with moderate resource requirements are offloaded to hardware to accelerate digital signature operations. The final device supports full-feature key generation, signature generation, and signature verification, and is built upon the mathematical components outlined in Section 2.1. Detailed design and implementation aspects are further discussed in the following sections.

As published in [16], the National Institute of Standards and Technology (NIST) initiated a standardization process to select cryptographic algorithms resilient to attacks from quantum computers—commonly referred to as post-quantum cryptography. Among the selected candidates, the Falcon scheme (Fast Fourier lattice-based compact signatures over NTRU) was chosen as a digital signature scheme to replace traditional schemes such as RSA, DSA, and ECDSA in the post-quantum context [5]. Post-quantum signature schemes based on the NTRU lattice problem have now been officially standardized in FIPS 203 [17] and FIPS 204 [18], issued by NIST as part of the Post-Quantum Digital Signature Standards. The Falcon scheme is architected upon the GPV framework (Gentry–Peikert–Vaikuntanathan), a foundational structure for lattice-based digital signatures [19]. The security of Falcon relies on the hardness of the Short Integer Solution (SIS) problem over the NTRU lattice [20]. Enhancements in Falcon’s design to ensure suitability in practical applications are also addressed in [20].

Based on a thorough analysis of the Falcon architecture, this research proposes a hardware/software co-design solution to optimize performance and minimize resource overhead. Instead of attempting to implement all complex operations, particularly those involving floating-point arithmetic and recursive functions, on a resource-constrained FPGA platform, we partitioned the algorithm’s functionalities. The core and computationally intensive components of Falcon, including NTT (Number Theoretic Transform) and floating-point FFT, are efficiently handled by the software component on the host PC. Meanwhile, essential but less complex functions with high usage frequency, such as random number generation, modulo addition, and signature verification, are accelerated using custom hardware on the FPGA chip.

2.2. Selection of Software and Hardware Components for the Post-Quantum Falcon Digital Signature Device

The selection of fundamental components from the Falcon digital signature scheme for hardware implementation on an FPGA chip, while delegating the remaining functionalities to software, significantly impacts the performance and execution speed of the system. Choosing which components to implement in hardware must be achieved in a way that minimally affects—or ideally enhances—the scheme’s security, which is essential during practical deployment. This hybrid partitioning allows the Falcon post-quantum digital signature scheme to maintain flexibility and adaptability in real-world applications. The authors proceed to present the core components selected for FPGA integration, along with an analysis of their influence on the performance and real-time execution of the Falcon scheme on hardware.

Figure 2 illustrates the architecture and main functional modules of both the hardware and software parts of the Falcon signature system based on the NTRU lattice problem, including the following:

For the Falcon software: (1) key_gen function for generating the public and private keys. (2) expand_key function to transform the private key into the falcon_tree structure. (3) sign_tree function to generate the signature from the expanded private key, the message to be signed, and an input nonce.

For the Falcon hardware device: (1) A random number generator module to generate seeds for key generation and nonce seeds for signing. (2) A modular adder (mod q) for public key generation. (3) The verify_raw function, implemented in hardware, is responsible for digital signature verification.

Figure 2

Block diagram and circuit design mq_add block.

[Figure omitted. See PDF]

Figure 3 outlines the operational flow of the proposed signing model involving hardware acceleration. The Falcon digital signature service is embedded into an FPGA platform, which improves execution speed, optimizes resource utilization, and enhances system security. The design approach starts with High-Level Synthesis (HLS) using open-source C implementations provided by NIST during Round 3 of the Post-Quantum Cryptography Standardization process. Modules synthesized include key_gen, sign_dyn (comprising expand_key and sign_tree), and verify.

Through the HLS (High-Level Synthesis) process of the Falcon digital signature scheme, the authors observed that fully implementing the key generation (key_gen) and signing (sign_dyn) modules in hardware is highly impractical and presents significant challenges. Moreover, technical difficulties arise due to the use of floating-point arithmetic and recursive functions within the Falcon scheme, which are inherently complex to implement efficiently on FPGA platforms. Addressing floating-point computations and recursion [6] further complicates hardware platform selection and implementation. Therefore, the authors decided to fully harden only the signature verification module (verify_raw) in hardware, while partitioning the key generation (key_gen) and signing (sign_dyn) processes, offloading certain computational steps to hardware. This hybrid approach ensures that the system maintains its security integrity while balancing resource constraints and implementation feasibility.

In the Falcon digital signature service system, the components of the functions used for generating the public–private key pairs and signing messages in the Falcon signature scheme are implemented in software by the authors. Specifically, for the signing operation, the process involves key expansion (expand_key) based on the input message combined with a nonce value (generated by the hardware), followed by message signing using the expanded falcon_tree (sign_tree) to produce the final signature. Since communication with the signing device is performed through a virtual COM port, the software interface includes a configuration section (such as port selection, baud rate, etc.) to match the hardware setup. The interface also provides operational buttons for each step of the signing scheme and displays the runtime for each function.

Even though we chose to implement the key generation (key_gen) module in software to optimize hardware resource usage, it is important to understand its core components. The key generation process in Falcon is one of the most complex and costly parts of the algorithm, primarily based on solving the NTRU equation.

This algorithm consists of three main steps:

Private key generation: Using the Falcon pseudo-random number generator, two polynomials f and g with small coefficients are created. These polynomials must satisfy specific conditions to ensure modular invertibility.

Public key computation: The public key h is calculated using the generated polynomials: hg.f1mod q.

Falcon Sampler Tree generation: This is the most critical and complex step, which uses the Fast Fourier Transform (FFT) on floating-point polynomials. This sampler tree is later used in the signing process to find a short vector.

This sampler generation process is particularly difficult to implement in hardware because it requires double-precision floating-point arithmetic and uses recursive functions. For this reason, we decided to handle the entire key generation process in a software environment to avoid dedicating valuable FPGA resources to operations that are only performed once.

Through theoretical surveys, analyses, and solution selection for both hardware and software design, the authors have developed a USB signing device (hardware), in which the main computation tasks of the Falcon scheme are performed using the AMD XC7A35T-CPG236 chip, a member of Xilinx’s Series 7 FPGA family [21]. The hardware-implemented components of the Falcon scheme include the following: An 8-bit pseudo-random number generator for generating public–private key pairs and nonce values used in signing; a modulo-q integer adder for computing the public key h; a signature verification module (verify_raw) for validating the digital signature.

(+) True Random Number Generator (trng_8bit):

To ensure the cryptographic security required by the Falcon post-quantum digital signature algorithm, particularly during sensitive processes demanding high randomness such as key generation and nonce creation, the system has been upgraded to employ a True Random Number Generator (TRNG). This TRNG is constructed using a hybrid physical/digital approach to generate a high-quality entropy source, comprehensively addressing potential security concerns related to randomness.

The authors’ TRNG design is based on the principle of sampling asynchronous Ring Oscillators (ROs). The raw random bit (random_bit) is produced by a base TRNG module (defined in entity work TRNG). This module exploits unpredictable physical variations (such as thermal noise and voltage fluctuations) of the FPGA chip to generate high-frequency oscillations. The oscillation frequency of the RO (fRO) is determined by the total propagation delay (τi) across its constituent gates:

(1)fRO1i=1nτi

The randomness inherent in fRO is sampled synchronously by the system clock (clk), generating the raw random_bit sequence. The operation of the trng_8bit module can be described as a synchronous system with a state Xn at time n.

(2)Xn=[r7   r6   r0   c]T

where ri are the bits of random_data (7 downto 0), and c is the value of the bit counter (bit_counter). At each rising edge of the clk, the state is updated according to the general formula:

(3)Xn+1=A·Xn+B· random_bit

The valid signal and the counter c are controlled by synchronous logic based on the byte completion condition:

(4)validn+1=δcn7

(5)cn+1=0,               if     cn=7       cn+1,   otherwise       

The TRNG method based on RO provides a robust source of physical entropy. Nevertheless, the authors acknowledge that this output is raw entropy bits that require further processing through an Entropy Conditioner compliant with the NIST standard to eliminate statistical bias and ensure absolute cryptographic security [22]. However, the primary focus of this research is to present a comprehensive solution for deploying the novel Falcon post-quantum signature algorithm, encompassing both hardware and software. Therefore, in-depth technical control over the design’s security (such as continuous health checks and side-channel attack hardening) will be reserved for subsequent, specialized studies.

(+) Signature Verification (verify_raw):

Next is the signature verification function (verify_raw), which was translated and synthesized from C code using the Vitis HLS tool, converting it into a lower-level VHDL representation. In this study, the conversion process of the verify_raw function from C to VHDL is described, and the algorithm is detailed in Algorithm 1 based on publication [20].

Algorithm 1: (signature authentication) verify_raw(c0, s2, h)
1. Input: c0, s2, h size n = 2logn 2. Output: - Returns “1”: valid signature; - Returns “0”: invalid signature.3. Require: Step 1: Normalize s2 to the domain [0, q − 1]:

s2u=  s2u+q.1s2u<0   mod q

Step 2: Calculate s1 using the formula:

s1=INTTNTTs2.h c0 mod ϕ mod q

Step 3: Normalize s1 to the domain [−q/2, q/2]:

s1u= s1uq.1s1u>q2

Step 4: The signature validity check must satisfy the function:

is_shorts1, s2

4. Result:

verify_raw(c0,s2, h)=1, if  s1, s2 is short enough0,    otherwise

(+) Modulo-q Addition (mq_add):

Finally, the modulo-q integer addition function, used for computing the public key h in the Falcon signature scheme, is implemented. Based on a deep understanding of the Falcon architecture, the hardware design of the mq_add block is inspired by reference [14]. The operation is mathematically described by Equation (6)

(6)d=x+y mod q,        if  x+y<q x+yq,            if x+y  q

The function computes the value of x + yq (where q = 12,289 and x, y are 32-bit unsigned integers). If the most significant bit (MSB) of the result is 1 (indicating a negative number in signed representation), q is added back to bring the result within the valid range [0, q − 1]. If the result is non-negative, it is kept unchanged.

3. Design and Development of the Falcon Digital Signature Device Based on FPGA Platform

The implementation of the post-quantum Falcon 1.0 digital signature software is specifically designed for the Falcon hardware signing device. The software is developed using Microsoft Visual Studio and runs on Windows operating system (×86 or ×64 architecture). It communicates with the hardware signing device via a virtual COM port, using libraries based on the NET Framework 4.7. The software was tested and evaluated for performance on a Dell Precision 7520 running Windows 10 Home.

In Figure 4, the authors present the functional blocks designed for the Falcon post-quantum signing device, including the following: (1) UART Block, (2) Verify Block, (3) mq_Add Block, (4) Random_SEED Block, (5) Switch Block, (6) UART Controller Block. The functions of these blocks are detailed as follows:

In the hardware design of the Falcon digital signature device, UART interface block (1) is responsible for UART communication between the hardware design and the CP2102 chip (USB to UART). Blocks (2), (3), and (4) correspond to the core functions of signature verification, modulo-q integer addition, and pseudo-random number generation, respectively. This arrangement enables stable communication between the Falcon device and the host computer during operation, ensuring reliable performance for digital signature services.

Furthermore, the design incorporates block (5), which consists of physical switches on the device. These switches are used to configure the operational mode of the device, allowing users to identify whether it is currently set to key generation, digital signing, or signature verification. Lastly, block (6) is integrated into the Falcon signature device design to serve as the central processing unit, managing data flow between the blocks and controlling the overall operation of the device.

Figure 4

Hardware circuit architecture diagram.

[Figure omitted. See PDF]

During the design and implementation phase of the Falcon signing device, the authors developed the pseudo-random number generator (trng_8bit block (4)) and the modulo-q adder (mq_add block (3)) using VHDL code with the Vivado tool. The XC7A35T-CPG236 FPGA chip was selected to implement these two modules. The integration process included simulation and verification using testbenches for both modules to ensure proper functionality.

3.1. Design and Implementation of the Trng_8bit Block

Figure 5 outlines the synchronous operation of the trng_8bit module within the new TRNG architecture. The module’s behavior is controlled by the signals rst (reset) and clk (clock), following the logical flow below: (I) Initialization (Reset): When the signal rst = ‘1’ (true), the module immediately reinitializes all registers. (II) Synchronous Sampling: When rst = ‘0’ (false), the logic waits for the rising edge of clk (clk = ‘1’). At this edge, the module performs the following functions: Reading the raw random_bit signal provided by the base TRNG module; assigning random_bit to the current position in the random_data (8-bit register), as indicated by bit_counter. (III) Counter and Valid Control: After storing the bit, bit_counter is incremented by 1. The logic then checks for byte completion: If bit_counter reaches 7 (meaning 8 bits have been collected), the valid signal is asserted (‘1’) to indicate that a random_byte is ready. Simultaneously, bit_counter is reset to 0 to begin collecting the next byte. Finally, random_byte is assigned to the value of random_data.

The hardware implementation of trng_8bit (detailed in Figure 6) employs a standard synchronous architecture consisting of the following:

A 3-bit counter register;

An 8-bit accumulation register;

Bit Merge Logic (RTL_BMERGE) to insert the random_bit into the correct position.

Figure 6

Block diagram and circuit design of the random_seed block.

[Figure omitted. See PDF]

This design leverages a true random source (random_bit) to generate non-repetitive random byte sequences, ensuring the cryptographic unpredictability required for the Falcon scheme.

3.2. Design and Implementation of a Modular Addition Unit for Two Integers (mq_Add Block)

The operation of adding two 32-bit integers modulo-q is illustrated in Figure 7. This represents the algorithm for modular addition of two integers, incorporating conditional addition—specifically, the value q is added back to the result if the initial summation yields a negative value. The hardware schematic of the modular addition unit (mq_add block) is shown with its input and output pins in Figure 8. In this figure, Figure 8a presents the overall block structure of the mq_add module using the Vivado 2023.2, while Figure 8b depicts the detailed RTL (Register Transfer Level) circuit of the mq_add block, also developed with Vivado.

Following this process, the authors proceeded to test the system by running simulation trials using a testbench. The results are presented in Figure 9 and Figure 10, from which it can be observed that for the trng_8bit block—although it generates pseudo-random values—the 255 values it produces are sufficient to meet the design requirements. Meanwhile, for the mq_add block, the summation results (Figure 10) are correct when compared with software-based testing.

Table 1 summarizes the resource utilization statistics for the two modules (mq_add and random_seed) on the XC7A35T-CPG236 chip. The LFSR design implemented by the research team is lightweight in terms of resource consumption, but it generates low-quality random numbers and does not yet meet the cryptographic security requirements or the NIST standards outlined in references [23,24]. However, this is only a prototype version intended to demonstrate the system’s functionality. The team plans to upgrade the linear pseudo-random number generator in future versions. From Table 1, the research team demonstrates that the random_seed and mq_add modules are compact and suitable for integration into various design architectures.

3.3. Design and Implementation of the Verify_Raw Block

In the process of designing and implementing the Falcon digital signature system, the research team first converted the C code of the post-quantum Falcon signature scheme into VHDL code (to encapsulate the components of the Falcon scheme into an SoC implemented on FPGA hardware). During this process, the team utilized the High-Level Synthesis (HLS) tool. Below is the result of the HLS of the verify_raw design (Algorithm 1), targeting the XC7A35T-CPG236 chip:

Table 2 provides the details of hardware IP core resource utilization on the FPGA XC7A35T-CPG236 for the verify_raw function. The results show that the synthesized IP core for the verification function is resource-efficient, saving BRAM, FF, and LUT, while also achieving extremely low latency. With a clock cycle count of 70 cycles at 100 MHz, the response time of the module is approximately 0.7 µs, indicating a fast verification response.

The verification block is designed based on the block diagram shown in Figure 11. The verify_raw function verifies a signature by performing a series of arithmetic operations on the values s2, c0, and h. First, it normalizes the elements of the s2 array into the modulo-q range. Then, it computes  s1=s2h c0 in the NTT domain, followed by re-normalization. Finally, the function checks the length of the resulting vector s1, s2  to determine whether the signature is valid. The verify_raw function is generated from the original C code using HLS into a hardware IP core, as illustrated in Figure 12 and Figure 13. In this context, Figure 12 shows the verify_raw block with its input and output ports, while Figure 13 presents the RTL schematic design of the verify_raw block in detail.

Table 3 summarizes the achieved results regarding the resource utilization of the signature verification module (verify_raw block) on the XC7A35T-CPG236 chip. According to the results in Table 3, the verification module uses a moderate amount of hardware resources. This is reflected in its resource usage: 14.3% for LUTs, 3.9% for Registers, and 16.7% for BRAM. The resource consumption of the Verify module indicates that the design has a medium level of complexity, making it well-suited for implementation on the XC7A35T-CPG236 chip. The use of 1 DSP and 14 submodules (instances) suggests that the module performs a variety of arithmetic and logic checks, but it is not overly complex and does not place a heavy burden on the FPGA-integrated system.

3.4. Design and Implementation of the UART Peripheral Interface Block

In addition to the modules serving the digital signature functionality of the Falcon device, the authors’ design also includes a UART peripheral interface block (1). This UART block is used to enable the Falcon signature device to communicate with a computer via a virtual COM port. As shown in Figure 14, the operation of the UART block is ensured through several components, including the following:

(1). Baudrate Generator—This block generates the baud_tick signal to synchronize the UART transmission and reception process based on the system clock and the configured baud rate. The baud_tick signal is issued after a specific number of clock cycles (calculated as CLOCK_FREQ/BAUD_RATE), ensuring correct transmission speed.

(2). UART Receiver—This is the UART data receiving unit, which receives data from the transmission line (rx), samples and processes the data through multiple states: IDLE, START_BIT, DATA_BITS, and STOP_BIT. Once 8 bits are received, it raises the rx_done = 1 signal and outputs the data.

(3). UART Transmitter—This is the UART data transmitting unit, which also operates through the following states: IDLE, START_BIT, DATA_BITS, and STOP_BIT. When it receives the tx_start signal, it begins transmission. Each data bit is sent out on every baud_tick cycle, and after transmitting all 8 bits, it asserts the tx_done = 1 signal.

Figure 14

UART module.

[Figure omitted. See PDF]

4. Results and Discussion

Based on the system service design and implementation solution proposed by the authors in this study, and with the results obtained from the design and construction of the digital signature device and signing software based on the post-quantum signature scheme Falcon according to the proposed approach, this section presents the outcomes achieved during the operation of the scheme on both the hardware device and the software developed for the Falcon digital signature device.

After completing the testing phase using simulation testbenches, the authors finalized the design and deployed it onto the hardware device. Figure 15 shows the detailed RTL synthesized circuit of the modules in the program based on the Falcon digital signature scheme, which will be loaded onto the hardware device using the XC7A35T-CPG236 chip. Figure 16 is a photograph of the device in operation, running the program modules that have been successfully loaded onto the hardware.

In Figure 16, the authors generated the bitstream file and programmed it onto the hardware device. Following that, the authors proceeded to conduct experimental runs and perform comparative testing between the hybrid digital signature system (software combined with hardware acceleration) and the purely software-based digital signature system. Through these experimental runs, the authors collected several results regarding the execution time of the hardware-accelerated device and the software-based signature solution, both implementing the Falcon digital signature scheme. The comparative results on key generation time, signing time, and verification time—alongside existing published benchmarks—are presented in Table 4.

The results presented in Table 4 show that, for the same key length of 1024 bits in the Falcon digital signature scheme, the key generation and signing times in this study are slower compared to those reported in [20]. However, the signature verification time achieved in this research is faster than that reported in [20]. This discrepancy can be attributed to the use of UART peripheral communication in the design of the Falcon signature device. Since UART has inherent limitations in data transmission speed, the overall execution speed of the digital signature service system is also restricted. This effect is clearly illustrated in Table 5, where the authors compare their results with those published in [7,20]. Typically, key generation in many practical applications is handled externally, outside of the signature system. Therefore, based on the authors’ perspective, the impact of key generation time on the overall system performance in real-world deployment is minimal. Nevertheless, with signature generation and verification times achieved in the range of 2–5 ms, the proposed hardware and software system demonstrates suitability for practical applications that demand strict performance and real-time constraints.

Based on the results achieved in this study, the design, development, and implementation of the Falcon digital signature scheme (with a key length of 1024 bits) demonstrate promising performance in terms of signing and verification. Notably, the verification step shows improved execution speed compared to the NIST reference software. However, key generation time remains relatively slow, primarily due to the use of the UART communication protocol and the lack of algorithmic optimization. This reflects certain limitations in hardware resource utilization and algorithm efficiency. In future research, the authors aim to enhance system performance by restructuring the algorithm and strengthening hardware-software co-design. Overall, the current implementation is suitable for applications with low key generation frequency but requires stable and efficient signing and verification performance.

For the software designed to accompany the post-quantum digital signature device Falcon: The interface shown in Figure 17 illustrates the main functionalities of the software, including: connecting to the signing device via UART communication (Configure Device), generating pseudo-random numbers (Random), inputting messages (Browse), digital signing (Sign), signature verification (Verify), and displaying results and execution speed. Figure 17 presents a detailed overview of the complete operation, from pseudo-random number generation, key generation, digital signing, to signature verification, executed through the software using an input message. During the testing phase, the research team conducted multiple experimental runs. The execution results of both the software and the hardware device are summarized by the authors in Table 4 and Table 5.

Figure 18 presents the resource utilization results of the algorithms implemented on the XC7A35T-CPG236 chip within the post-quantum digital signature device Falcon. The resource usage achieved in this study shows an improvement compared to the findings in [7]. As shown in Figure 18, the design demonstrates high efficiency in hardware resource utilization, with extremely low usage across key components such as LUTs, Flip-Flops (FFs), and BRAMs, all remaining below 15%. This leaves substantial headroom for system expansion without facing resource constraints, especially given that I/O usage is currently only around 13.21%.

Notably, the design virtually does not use any LUTRAM, indicating that the processing architecture does not rely heavily on temporary logic-based memory. This helps reduce latency and improve overall performance. Moreover, it contributes to mitigating the risk of hardware-based side-channel attacks, where adversaries attempt to exploit buffer memory vulnerabilities. However, it is important to highlight that DSP (Digital Signal Processing block) usage reaches 72.22%—a relatively high value. This suggests that certain components of the Falcon signature scheme integrated into the hardware require a significant amount of complex arithmetic computations.

Table 6 provides a comparative analysis of our proposed hardware/software co-design solution with other recent, state-of-the-art implementations of Falcon, specifically highlighting the differences in design philosophy, hardware architecture, and performance.

The comparative analysis reveals that our work, while achieving slower key generation and signing times compared to [8,25], demonstrates a practical and distinct approach for deploying Falcon. The studies by [8,25] represent advanced, highly optimized hardware accelerators that target specific low-level operations or use a custom processor design, which yields superior performance. For instance, [8] achieved a 3.58× reduction in clock cycles for FALCON-1024 signing compared to a previous accelerator, while [25] customized a RISC-V processor demonstrated a 9× performance improvement for Falcon verification compared to a baseline. Their designs benefit from being implemented on a more advanced process technology (28 nm) or a more powerful platform (AMD Zynq-7000), allowing them to implement complex FPGA operations directly in hardware.

Verification Performance—The verification time of the proposed design is 2.238 ms, equivalent to approximately 223,800 cycles (at 100 MHz), which is slower compared to highly optimized implementations.

Key Generation Time—The slower key generation time (1516.2 ms) is primarily due to UART communication overhead (low-speed serial interface) rather than computational logic.

In summary, the proposed solution focuses on a more accessible and cost-effective approach. By offloading resource-intensive floating-point operations to software, the authors successfully implemented a complete Falcon system on a standard, low-cost FPGA platform. The main contribution is the demonstration that functional-level hybrid partitioning is a viable strategy for embedded systems where cost and implementation simplicity are the key factors rather than strict performance constraints. Therefore, this work stands as a valuable proof-of-concept for the deployment of post-quantum cryptography on general-purpose embedded hardware.

5. Conclusions

In this study, the authors proposed a design and developed a hardware device and software for digital signatures based on the post-quantum digital signature scheme Falcon. In this design, the hardware device (using the FPGA XC7A35T-CPG236 chip) is responsible for executing key functions such as random number generation (random_seed), modulo-q addition of integers (mq_add), and signature verification (verify_raw). Meanwhile, the remaining core functions of the Falcon scheme are handled by the software component. The results show that the key generation execution time is approximately 1516.2 milliseconds, and the signature and verification times are about 2–5 milliseconds. In terms of hardware resource utilization, significant improvements have been achieved, and the measured power consumption is 0.097 W (57%), indicating that the Falcon signature device operates with moderate energy usage. These findings suggest that the proposed hardware and software solution is suitable for certain applications that do not impose strict requirements on execution speed and power efficiency. Furthermore, the study demonstrates the feasibility of implementing post-quantum cryptographic solutions on embedded hardware platforms, contributing to the development of secure communication systems in the future. In the future, the research team will focus on optimizing the current design to further reduce execution time and enhance overall efficiency. Specifically, we develop to meet the requirements of NIST standards. These efforts will ensure the design is not only functional but also cryptographically secure for future versions.

Author Contributions

Methodology, T.-T.N., T.-T.D. and N.-Q.L.; Software, D.-D.N.; Resources, T.-T.N., T.-T.D. and N.-Q.L.; Writing—original draft, T.-T.N.; Writing—review & editing, T.-T.N.; Supervision, T.-T.D. and N.-Q.L.; Project administration, N.-Q.L. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Acknowledgments

The authors thank the Academy of Cryptography Techniques and Electronic engineering doctoral program of University Transport and Communications for supporting this work.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AESAdvanced Encryption Standard
DSADigital Subtraction Angiography
DSPDigital Signal Processing block
ECDSAElliptic Curve Digital Signature Algorithm
FFTFast Fourier Transform
FPGAField-Programmable Gate Array
GenAIGenerative Artificial Intelligence
GPVGentry–Peikert–Vaikuntanathan
HLSHigh-Level Synthesis
NISTNational Institute of Standards and Technology
NTRUN-th degree Truncated polynomial Ring Units
PQCPost-Quantum Cryptography
SISShort Integer Solution
TRNGTrue Random Number Generator

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Figures and Tables

Figure 1 Digital signing process flowchart.

View Image -

Figure 3 Operational diagram of the software and hardware of the Falcon digital signature device.

View Image -

Figure 5 Block diagram of the trng_8bit module.

View Image -

Figure 7 Mq_add module block diagram.

View Image -

Figure 8 Block diagram and circuit design mq_add block: (a) mq_add block; (b) RTL circuit of mq_add RTL block.

View Image -

Figure 9 random_seed simulation.

View Image -

Figure 10 Mq_add simulation.

View Image -

Figure 11 Verify_raw module block diagram.

View Image -

Figure 12 verify_raw block.

View Image -

Figure 13 RTL circuit diagram of the verify_raw block.

View Image -

Figure 15 Summary of RTL circuit diagram of modules on the XC7A35T-CPG236 chip.

View Image -

Figure 16 Actual running image of Falcon digital signature device with XC7A35T-CPG236 chip.

View Image -

Figure 17 Results of the performance of Falcon digital signature software when connected to the Falcon digital signature device.

View Image -

Figure 18 Utilization power device.

View Image -

Summary of design resources of random_seed and mq_add on the XC7A35T-CPG236 chip.

Function Resource Utilization Available Utilization (%)
trng_8bit LUT 11 20,800 0.05
FF 13 41,600 0.03
IO 11 250 4.40
mq_add LUT 80 20,800 0.38
FF 32 41,600 0.08
IO 98 250 39.20

HLS design summary of verify_raw.

Function Verify_Raw
Degree 1024
BRAM 1
DSP 15
FF 2984
LUT 3690
Clock Cycles 70
Latency (ms) 0.0007
Clock (MHz) 100

Summary of verifying raw design resources.

Name BRAM_18K DSP FF LUT URAM
DSP _ 1 _ _ _
Expression _ _ 0 1131 _
FIFO _ _ _ _ _
Instance _ 14 1330 2274 _
Memory 1 _ 27 5 _
Multiplexer _ _ _ 280 _
Register _ _ 1627 _ _
Total 1 15 2984 3690 0
Available 50 90 41,600 20,800 0
Utilization (%) 2 16.7 7.2 17.7 0

Comparison of execution time results of a digital signature device using the XC7A35T-CPG236 chip and software based on the Falcon post-quantum digital signature scheme with other publications.

Criteria USB Hardware Signed (FPGA + SW) Fully Software-Based Digital Signature [20]
Version 1024 1024
Key generation time (ms) 1516.2 27.45
Signing time (ms) 5.0 2.913
Authentication time (ms) 2.2381 7.326

Comparison of execution time results based on the Falcon post-quantum digital signature scheme of a digital signature device using the XC7A35T-CPG236 chip with other publications.

Falcon-1024 Generate Key(ms) Signing (ms) Verification (ms)
[20] 80 2.22 0.298
[7] 320.3 8.7 1.258
This work 1516.2 5.0 2.238

Comparison of the design in this paper with other published works.

Criterion Target Context Clock Frequency Key Gen Sign Time Verify Time LUT Utilization DSP Utilization Power Consumption Main Advantage
This work(Arty-7 + SW) Low-Cost/Resource Constrained FPGA 100 MHz (Specify) 151,620,000 cycles (1516.2 ms) 500,000 cycles (5 ms) 223,800 cycles (2.238 ms) 17.7% 16.7% FPGA Power 97 mW Low-Cost Platform Feasibility
Lee et al.(ASIC + SW) [8] Low-Power ASIC Accelerator 250 MHz (for ASIC core) 37.82 ms (FALCON-1024) 1.80 ms (FALCON-512) N/A Area: 38 k uM Core Only Core Only 5.972 mW Highest Power Efficiency
Ye et al.(RISC-V SoC) [25] Custom SoC Processor Fill Freq N/A N/A 179,080cycles (FALCON-512) N/A Core Only Core Only 3.05 mW High Integration and Flexibility
Pendyala et al. (Zynq FPGA) [26] High-Performance FPGA Fill Freq Fill Cycles Fill Cycles Fill Cycles Fill Freq Fill Freq Fill Power Max Throughput via Pipelining
SPHINCS+ [27] Area-Efficient FPGA 149–156 MHz Fill Cycles Fill Cycles Fill Cycles 8.3% (Low Area) 0% 400 mW Dynamic(XZU3EG Total) High Security(Hash-Based)
Dilithium-III [28] High-Performance SoC FPGA 182–217 MHz Fill Cycles Fill Cycles Fill Cycles 19,614 LUTs (7.8% Z7000) 8–10 DSPs Core Only (Low μW Range) Highest Throughput PQC

References

1. Bernstein, D.J. Introduction to post-quantum cryptography. Post-Quantum Cryptography; Springer: Berlin/Heidelberg, Germany, 2025; pp. 1-14. [DOI: https://dx.doi.org/10.1007/978-3-540-88702-7_1]

2. Sarah, D.; Peter, C. On the practical cost of Grover for AES key recovery. Proceedings of the 5th NIST PQC Standardization Conference; Rockville, MD, USA, 10–12 April 2024; pp. 1-22.

3. Gidney, C.; Ekerå, M. How to factor 2048 bit RSA integers in 8 hours using 20 million noisy qubits. Quantum; 2021; 5, 433. [DOI: https://dx.doi.org/10.22331/q-2021-04-15-433]

4. Webber, M.; Elfving, V.; Weidt, S.; Hensinger, W.K. The impact of hardware specifications on reaching quantum advantage in the fault tolerant regime. AVS Quantum Sci.; 2022; 4, 13801. [DOI: https://dx.doi.org/10.1116/5.0073075]

5. Alagic, G.; Bros, M.; Ciadoux, P.; Cooper, D.; Dang, Q.; Dang, T.; Kelsey, J.; Lichtinger, J.; Liu, Y.-K.; Miller, C. . Status Report on the Fourth Round of the NIST Post-Quantum Cryptography Standardization Process; US Department of Commerce, National Institute of Standards and Technology: Gaithersburg, MD, USA, 2025; [DOI: https://dx.doi.org/10.6028/NIST.IR.8545]

6. Karabulut, E.; Aysu, A. A Hardware-Software Co-Design for the Discrete Gaussian Sampling of FALCON Digital Signature. Proceedings of the 2024 IEEE International Symposium on Hardware Oriented Security and Trust (HOST); Tysons Corner, VA, USA, 6–9 May 2024; pp. 90-100. [DOI: https://dx.doi.org/10.1109/HOST55342.2024.10545399]

7. Schmid, M.; Amiet, D.; Wendler, J.; Zbinden, P.; Wei, T. Falcon Takes Off-A Hardware Implementation of the Falcon Signature Scheme. Cryptol. Eprint Arch.; 2023; pp. 1-17. Available online: https://eprint.iacr.org/2023/1885 (accessed on 14 November 2025).

8. Lee, Y.; Youn, J.; Nam, K.; Jung, H.H.; Cho, M.; Na, J.; Park, J.-Y.; Jeon, S.; Kang, B.G.; Oh, H. . An Efficient Hardware/Software Co-Design for FALCON on Low-End Embedded Systems. IEEE Access; 2024; 12, pp. 57947-57958. [DOI: https://dx.doi.org/10.1109/ACCESS.2024.3387489]

9. Bai, S.; Jangir, H.; Lin, H.; Ngo, T.; Wen, W.; Zheng, J. Compact Encryption Based on Module-NTRU Problems. Post-Quantum Cryptography; Springer: Cham, Switzerland, 2024; Volume 14771, pp. 371-405. [DOI: https://dx.doi.org/10.1007/978-3-031-62743-9_13]

10. NTRU: A Submission to the NIST Post-Quantum Standardization Effort. 2025; Available online: https://ntru.org/ (accessed on 12 May 2025).

11. Basu, K.; Soni, D.; Nabeel, M.; Karri, R. NIST Post-Quantum Cryptography—A Hardware Evaluation Study. IACR Cryptol. ePrint Arch.; 2019; 2019, pp. 1-16.

12. Dione, D.; Seck, B.; Diop, I.; Cayrel, P.-L.; Faye, D.; Gueye, I. Hardware Security for IoT in the Quantum Era: Survey and Challenges. J. Inf. Secur.; 2023; 14, pp. 227-249. [DOI: https://dx.doi.org/10.4236/jis.2023.144014]

13.NIST.IR.8547 Transition to Post-Quantum Cryptography Standards; National Institute of Standards and Technology: Gaithersburg, MD, USA, 2024; [DOI: https://dx.doi.org/10.6028/NIST.IR.8547.ipd]

14. Luc, N.Q.; Nguyen, T.T.; Quach, D.H.; Dao, T.T.; Pham, N.T. Building Applications and Developing Digital Signature Devices based on the Falcon Post-Quantum Digital Signature Scheme. Eng. Technol. Appl. Sci. Res.; 2023; 13, pp. 10401-10406. [DOI: https://dx.doi.org/10.48084/etasr.5674]

15. Castelvero, L.; Grande, I.H.L.; Pruneri, V. High-Performance Time-to-Digital Conversion on a 16-nm Ultrascale+ FPGA. IEEE Access; 2024; 12, pp. 149569-149579. [DOI: https://dx.doi.org/10.1109/ACCESS.2024.3477295]

16. Boutin, C. NIST Announces First Four Quantum-Resistant Cryptographic Algorithms. 2025; Available online: https://www.nist.gov/news-events/news/2022/07/nist-announces-first-four-quantum-resistant-cryptographic-algorithms (accessed on 9 June 2025).

17.NIST.FIPS.203 Module-Lattice-Based Key-Encapsulation Mechanism Standard; National Institute of Standards and Technology (NIST): Gaithersburg, MD, USA, 2024; [DOI: https://dx.doi.org/10.6028/NIST.FIPS.203]

18.NIST.FIPS.204 Module-Lattice-Based Digital Signature Standard; National Institute of Standards and Technology (NIST): Gaithersburg, MD, USA, 2024; [DOI: https://dx.doi.org/10.6028/NIST.FIPS.204]

19. Shannon, C. The lattice theory of information. Trans. IRE Prof. Gr. Inf. Theory; 1953; 1, pp. 105-107. [DOI: https://dx.doi.org/10.1109/TIT.1953.1188572]

20. Fouque, P.-A.; Hoffstein, J.; Kirchner, P.; Lyubashevsky, V.; Pornin, T.; Prest, T.; Ricosset, T.; Seiler, G.; Whyte, W.; Zhang, Z. Falcon: Fast-Fourier Lattice-based Compact Signatures over NTRU Specifications v1.2. 2020; pp. 1-65. Available online: https://falcon-sign.info/ (accessed on 15 June 2025).

21. Xilinx Inc. 7 Series FPGAs Datasheet; Xilinx Technology Document: San Jose, CA, USA, 2020; Volume 180, pp. 1-19.

22.NIST.SP.800-90B Recommendation for the Entropy Sources Used for Random Bit Generation; National Institute of Standards and Technology: Gaithersburg, MD, USA, 2018; [DOI: https://dx.doi.org/10.6028/NIST.SP.800-90B]

23. Zode, P.; Zode, P.; Deshmukh, R. FPGA Based Novel True Random Number Generator using LFSR with Dynamic Seed. Proceedings of the 2019 IEEE 16th India Council International Conference (INDICON); Gujarat, India, 13–15 December 2019; pp. 1-3. [DOI: https://dx.doi.org/10.1109/INDICON47234.2019.9029049]

24. Tsoi, K.H.; Leung, K.H.; Leong, P.H.W. Compact FPGA-based true and pseudo random number generators. Proceedings of the FCCM 2003 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines; Napa, CA, USA, 9–11 April 2003; FCCM 2003 pp. 51-61. [DOI: https://dx.doi.org/10.1109/FPGA.2003.1227241]

25. Ye, Z.; Huang, J.; Huang, T.; Bai, Y.; Li, J.; Zhang, H.; Li, G.; Chen, D.; Cheung, R.C.C.; Huang, K. PQNTRU: Acceleration of NTRU-Based Schemes via Customized Post-Quantum Processor. IEEE Trans. Comput.; 2025; 74, pp. 1649-1662. [DOI: https://dx.doi.org/10.1109/TC.2025.3540647]

26. Pendyala, S.; Magesh, R.; Kavun, E.B.; Aysu, A. Outrunning the Millennium FALCON: Speed Records for FALCON on Xilinx FPGAs. Cryptol. Eprint Arch.; 2025; pp. 1-23.

27. Berthet, Q.; Upegui, A.; Gantel, L.; Duc, A.; Traverso, G. An Area-Efficient SPHINCS + Post-Quantum Signature Coprocessor. Proceedings of the 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW); Portland, OR, USA, 17–21 June 2021; pp. 180-187. [DOI: https://dx.doi.org/10.1109/IPDPSW52791.2021.00034]

28. Wang, T.; Member, G.S.; Zhang, C.; Cao, P.; Gu, D. Efficient Implementation of Dilithium Signature Scheme on FPGA SoC Platform. IEEE Trans. Very Large Scale Integr. Syst.; 2022; 30, pp. 1158-1171. [DOI: https://dx.doi.org/10.1109/TVLSI.2022.3179459]

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.