Content area

Abstract

In hardware/software (HW/SW) partitioning, the most commonly established objectives are execution time, power consumption, and hardware area. Surprisingly, memory usage, a critical resource in embedded systems, has received limited attention as a primary optimization objective. Moreover, the few studies that consider memory rarely provide an explicit, design-time estimation method. This work proposes a methodology for obtaining memory usage as a design metric, along with an objective function tailored to evaluate memory usage in systems-on-chip featuring a hard processor core and a Field-Programmable Gate Array suitable for a HW/SW partitioning problem. To validate the proposed methodology, HW/SW partitioning was carried out for a PD-type fuzzy control algorithm targeting a DC motor. The optimization problem was solved using the Non-dominated Sorting Genetic Algorithm II. The results demonstrate the feasibility and accuracy of the proposed approach, achieving more than 97.5% accuracy in predicting memory and hardware resource consumption. Additionally, the functional performance of the selected partition configuration was validated in real-time, where the tracking of different reference signals for the velocity of the motor was successfully achieved.

Full text

Turn on search term navigation

1. Introduction

The implementation of algorithms in digital devices, such as microcontrollers, Field-Programmable Gate Arrays (FPGAs), and Systems-on-Chip (SoCs), is fundamental to the development of control systems, signal processing solutions, and intelligent applications. These systems must simultaneously meet requirements for high performance, low energy consumption, and short development cycles, all under strict resource constraints. A widely adopted strategy to balance these demands is hardware/software (HW/SW) partitioning, which involves deciding which parts of an algorithm are implemented in reconfigurable hardware and which are executed on an embedded processor, according to specific design metrics and constraints.

However, selecting an optimal partition is challenging, as it requires balancing multiple objectives such as execution time, hardware area, and power consumption. Although most current methodologies consider these parameters as primary objectives, memory usage, a critical resource in embedded systems, is rarely integrated explicitly into the decision-making process. When it is included, its estimation is often approached or carried out only at late stages of development, limiting its value as an early design decision parameter.

In the state of the art, the HW/SW partitioning problem has been addressed using several approaches. One line of work relies on algorithm profiling [1], where blocks with the highest computational load are migrated to hardware. Other studies employ exact methods such as Integer Linear Programming [2] and Branch and Bound [3]. More recently, the dominant trend has been the use of heuristic algorithms, particularly evolutionary approaches [4]. For instance, ref. [5] presents two hybrid algorithms: the first combines Lagrangian Relaxation (LR) with the Subgradient method, while the second integrates LR with the 0–1 Knapsack problem and a Genetic Algorithm. In [6], a game-theory-based approach combining the GO game and Minmax algorithm is introduced. Other heuristic techniques include an immune algorithm-based partitioning method [7], as well as multi-objective extensions such as the fireworks-based algorithm proposed in [8]. Although these methods have demonstrated improvements in the hardware area and execution time, they remain primarily theoretical and do not address memory as a design metric.

In recent years, several works have highlighted the increasing relevance of memory behavior and data-movement constraints in modern embedded and reconfigurable architectures. For instance, ref. [1] propose a HW/SW partitioning strategy for real-time object detection on SoCs that explicitly models memory bandwidth limitations to improve performance. More recent studies extend this perspective: ref. [9] introduces MEDEA, a design-time multi-objective manager that incorporates “memory-aware” mechanisms, such as tiling, DVFS, and task scheduling, to optimize heterogeneous systems. Similarly, ref. [10] present a holistic optimization framework for FPGA accelerators that jointly considers partitioning, scheduling, and data-movement costs, demonstrating that memory constraints increasingly drive architectural decisions. Other works focus on memory-specific optimizations in hardware design flows, such as the pattern-morphing–based memory partitioning technique proposed in [11] for reducing access conflicts in HLS-generated architectures. Recent co-design surveys, for example, ref. [12], emphasize that modern AI-oriented embedded systems critically depend on efficient memory utilization throughout the HW/SW co-design process.

However, none of these recent methodologies provide a detailed, module-level extraction of memory usage in C-based implementations, nor do they integrate this information into a multi-objective HW/SW partitioning flow. This gap highlights the need for methodologies that incorporate memory as a first-class design metric from the early stages of system development.

In this work, an HW/SW partitioning methodology that explicitly incorporates a memory usage metric is proposed, together with hardware area, within a multi-objective optimization framework. The novel contributions of this work are as follows:

An objective function adapted to SoCs with a hard-core processor, accounting for resources associated with HW/SW communication.

A procedure to extract memory metrics from detailed analysis of memory mapping in C-based software implementations.

An evaluation flow that considers aspects such as auxiliary data conversion functions and HW/SW synchronization, as well as correction factors to avoid overestimation due to shared libraries.

The proposed methodology is validated on a PD-type fuzzy controller for a DC motor implemented on a Xilinx Zynq® SoC (San Jose, CA, USA). This controller architecture, including its FPGA implementation, was introduced in [13]. A PD-type fuzzy controller is selected because it provides a good trade-off between robustness and implementation cost: the fuzzy rule base improves the handling of nonlinearities and uncertainties while using only the error and its derivative, as reported in recent motion-control applications such as bridge cranes, lane-keeping systems, and cable-driven robots [14,15,16]. At the same time, the controller remains simpler than more elaborate nonlinear schemes, and its limited rule base and Mamdani-type inference mechanism lead to moderate memory and hardware requirements, in line with recent sparse fuzzy PID implementations [17]. This makes the PD-type fuzzy controller a convenient benchmark to study memory- and area-aware HW/SW partitioning on SoC platforms under the resource bounds adopted in this work (see Remark 1). The optimization problem is solved using the Non-dominated Sorting Genetic Algorithm II (NSGA-II). Thus, this methodology aims to guide design decisions for digital devices with limited memory resources.

Novelty and Contributions

While the introduction presents a broad discussion of related work, this subsection summarizes the specific contributions of the proposed methodology and clarifies how it differs from representative approaches in the literature. Table 1 highlights key distinctions in partitioning strategies, memory-awareness, analysis granularity, and reported contributions.

This paper is organized as follows. Section 2 reviews the objective functions found in the literature that explicitly consider memory. Section 3 introduces the proposed methodology for extracting and integrating the memory metric into the partitioning process. Section 4 applies this methodology to a case study, performing the HW/SW partitioning of a fuzzy control algorithm with hardware resource consumption and memory usage as the main metrics. Section 5 reports the experimental results of the selected HW/SW configuration. Finally, Section 6 presents the conclusions.

2. Previous Work

The objective functions for memory usage reported in the literature are not very diverse; however, they play a crucial role in the optimization process, since the quality of the objective function has a direct impact on the quality of the obtained solutions. One representative example is the mono-objective function proposed in [18], where a single objective function combines multiple design metrics, as shown below:

(1)O.F. = 100 · TSTimeSw + THTimeHw + MCostSw,

where TimeSw and CostSw denote the execution time and the memory requirement, respectively, when all modules are implemented in software. TimeHw is the execution time when all modules are implemented in hardware, while TS and TH represent the execution time of the solution when implemented in software and hardware, respectively. The last two are defined as follows:

(2)TS(x) = i=1nTiS · 1  xi,

(3)TH(x) = i=1nTiH · xi,

where x = [x1, x2, , xn] denotes the vector of binary decision variables, and n specifies its dimension, i.e., the total number of modules in the system. TiS and TiH represent the execution times of the i-th module in software and hardware, respectively. Each binary variable xi indicates whether the i-th module is mapped to software (xi = 0) or to hardware (xi = 1). Finally, M denotes the memory requirements of the components assigned to the software architecture. The total memory consumption is obtained as:

(4)M(x) = i=1nCiS · 1  xi,

where CiS represents the software cost for the i-th module.

On the other hand, the work [19] presents a multi-objective approach, where each design metric is modeled by an independent objective function. In addition, parallelism is considered through the introduction of a binary variable yi, which indicates whether a module can be executed in parallel (i.e., it has no dependency on other modules). The objective functions to be minimized are:

(5)O1(x) = i=1nxi · Ai + 1  i=1nxi · Aμp,

(6)O2(x) = i=1nxi · HMi + 1  i=1nxi · HMμp,

(7)O3(x) = i=1n(1  xi) · Hi,

(8)O4(x) = i=1n1yi · xi · xi+1 · max(thi, thi+1)yi · xi · (1  xi+1) · max(tsi, tsi+1)yi · (1  xi) · xi+1 · max(tsi, tsi+1)yi · yi  1¯ · (1  xi) · (1  xi+1) · max(tsi, tsi+1)yi¯ · yi1¯ · xi · thiyi · yi1¯ · (1  xi) · tsi + yn1¯[xn · thn + (1  xn) · tsn],

where O1 represents the hardware area, O2 the hardware multipliers, O3 the memory blocks, and O4 the execution time. The vector x again denotes the decision variables, and yi specifies whether module i can be executed in parallel (i.e., has no dependencies). Ai denotes the hardware resources consumed by module i (LUTs or FFs), Hi represents the memory blocks used by module i when implemented in software, thi and tsi are the execution times of module i in hardware and software, respectively, and HMi corresponds to the number of hardware multipliers (DSP units) used by module i.

The above objective functions are constrained by the following expressions:

(9)O1(x) < S,

(10)O2(x) < HM,

(11)O3(x) < H,

(12)O4(x) < Talg,

where S, HM, and H are, respectively, the area, memory size, and number of hardware multipliers available for the design. Talg corresponds to the maximum allowable execution time.

It is worth noting that the work in [19] considers a soft-core processor. Therefore, the variable Aμp represents the resources consumed by the processor, the bus, and its peripherals, while HMμp accounts for the DSP units used to implement the processor on the FPGA. Considering all these aspects, this multi-objective formulation is the most suitable approach when the goal is to perform a Pareto-based optimization. Thus, the trade-offs among metrics can be analyzed more effectively, and the constraints can be applied directly to the Pareto front, facilitating the identification of feasible configurations that satisfy the system requirements.

3. Proposed Methodology for Objective Functions of Hardware Area and Memory

The objective functions commonly reported in the literature are simplified and do not fully reflect real implementation behavior. To address this limitation, a methodology is proposed to construct practical objective functions for hardware area and memory, based on extracting accurate resource usage from system modules and incorporating correction and synchronization factors. These functions can then be directly integrated into HW/SW partitioning optimization processes.

3.1. Memory Usage

Memory usage in software implementations consists of two components: the intrinsic memory required by each functional module and the fixed memory inherent to the processor architecture. The proposed methodology provides a systematic procedure to estimate the intrinsic memory consumption of each module and include it explicitly as a metric in the HW/SW partitioning process.

3.1.1. Generalization of Memory Consumption Extraction

To estimate the memory consumption of each module, the following steps are performed:

Determine the minimum system memory

A minimal C project containing only the processor initialization logic is built to determine the lower bound of required system memory, referred to as the minimum memory.

Implement each module individually

Each module is implemented and built independently within the chosen development platform (e.g., Vitis™). This provides the memory usage report and the corresponding memory mapping information.

Compute intrinsic memory consumption

The intrinsic memory of each module is obtained by subtracting the minimum memory from the memory reported for the module, isolating the memory attributable to its functionality.

3.1.2. General Objective Function of Memory Usage

Using the extracted data, the proposed general memory objective function is defined as:

(13)Omemory = O3 + (1  i=1nxi) · Hmin + Δ · Hsync  Hcorrection,

where O3 represents the sum of the intrinsic memory of all modules implemented in software, and n is the total number of modules. The term Hmin is the minimum system memory, and Hsync represents the additional memory required by synchronization libraries when hardware and software coexist.

The correction factor Hcorrection subtracts the memory consumption associated with libraries that are shared across different modules in order to avoid counting them multiple times. Since the functions were implemented separately to obtain their individual metrics, directly summing the reported memory usage would lead to an overestimation whenever common libraries are included in more than one module. In practice, however, each shared library needs to be loaded only once, regardless of how many functions use it. This factor can be estimated as follows:

Analyze the source code files (.c) of each module to identify libraries that are repeatedly included.

Locate these libraries in the memory mapping files (.mem or equivalent) to determine their memory usage.

Subtract the duplicated consumption from the total estimation, thereby avoiding an overestimation of the actual memory usage in the final configuration.

The operator Δ proposed activates the synchronization term only when both hardware and software modules are present:

(14)Δ = i=1nxin  i=1nxi.

3.2. Hardware Area Usage

The hardware area determines the feasibility of an implementation on a given device. In FPGAs, this cost is expressed in logic resources such as LUTs, FFs, and hardware multipliers (DSP blocks). The proposed objective functions extend baseline formulations by adding communication overhead and auxiliary hardware when HW and SW coexist.

For LUTs and FFs, the generalized objective function proposed is:

(15)OA(x) = O1 + Δ · Acom + Aextra,

where O1 is the baseline expression, Acom accounts for communication-related hardware, and Aextra represents auxiliary hardware resources.

For hardware multipliers, the proposed formulation is:

(16)OHM(x) = O2 + Δ · HMcom + HMextra,

where O2 is the baseline usage, HMcom captures communication overhead, and HMextra includes multipliers required outside the main modules.

3.3. Constraints

Each objective function is subject to the following resource constraints:

(17)OA(x) < S,

(18)OHM(x) < HM,

(19)Omemory(x) < H,

where S, HM, and H denote the available LUT/FF area, hardware multipliers, and memory capacity, respectively.

3.4. Limitations of the Methodology

Regarding the proposed methodology for obtaining and using memory utilization in HW/SW partitioning, it is important to note that, due to its higher complexity compared to the hardware usage counterpart, the memory-oriented flow presents additional limitations.

First, based on the proposed Equation (13), the main limitation arises from the term Hcorrection, since it requires identifying and analyzing, on a case-by-case basis, all modules that share common memory libraries. Consequently, when the number of modules is large, this process may become time-consuming during the design stage. A possible solution is the development of an automated script capable of parsing and analyzing memory-mapping files to detect shared libraries, which we consider as future work. In the present manuscript, the focus was placed on exploring and validating the methodology rather than fully automating this analysis.

Second, concerning the applicability of the methodology to other SoC vendors or device families, the main requirement is the availability of a memory-mapping file. Even if the file format differs across toolchains (e.g., Xilinx versus Intel/Altera), the methodology remains valid as long as the necessary address and memory-allocation information can be extracted.

Third, different hardware memory configurations may alter the interpretation of the metric. For example, FPGAs may use distributed RAM, multiple independent BRAM/URAM banks, or local scratchpad memories. Systems with caches, DMA buffers, or FIFO-based communication introduce further variability, since the effective number of memory accesses may differ from the nominal access count. While the methodology can be extended to these cases, such extensions were beyond the scope of this manuscript and represent an opportunity for future extensions.

Finally, the limitations of the hardware objective functions, Equations (15) and (16), are less restrictive. In general, FPGA toolchains from major vendors (such as Xilinx or Intel/Altera) provide detailed reports on resource utilization after synthesis and implementation. These reports can be used directly within the proposed HW/SW partitioning framework without requiring additional processing.

4. Case Study: PD Fuzzy Controller for DC Motor

The structure of the fuzzy PD control system is shown in Figure 1. Considering the granularity classification proposed in [19], the present work includes modules with level-1 granularity (arithmetic/logical operators) and level-3 granularity (functional modules). This modular approach preserves the physical significance of each partition, enabling a more intuitive design process that is easier to debug and analyze in case of failures. The design and hardware implementation of the fuzzy PD control system, including the controller architecture, have been detailed in [13]. Therefore, only a brief description of the modules that comprise the control system is provided below:

M1 (level-1 granularity) consists of a subtractor, the generation of a reference signal, and the computation of the tracking error.

M2 (level-3 granularity) includes a robust sliding mode differentiator and a pair of multipliers to apply the proportional (Gp) and derivative (Gd) gains.

M3 (level-3 granularity) contains a Mamdani-type fuzzy PD controller and a multiplier to apply the output gain (Gs).

M4 (level-3 granularity) implements a decoder for signals from a quadrature encoder.

M5 (level-3 granularity) is responsible for normalizing the speed signal; in the hardware implementation, it also performs word-length reduction from 32 bits to 16 bits.

M6 (level-3 granularity) contains a pulse-width modulation (PWM) generator.

M7 (level-3 granularity) consists of a digital low-pass filter.

Figure 1

System with granularity proposed.

[Figure omitted. See PDF]

4.1. Initial Considerations for Adaptation of Particular Objective Functions

The proposed HW/SW partitioning approach considers two primary design objectives: hardware resource utilization and memory usage. Each objective is formulated to enable evaluation within a multi-objective optimization framework. The number of LUTs was selected as the main metric for hardware, while the number of memory blocks is used as a software metric. Additionally, FFs and DSPs will be monitored to ensure they remain within acceptable bounds. In general, the objective functions rely on Equations (13), (15) and (16), but in particular, the objective functions are proposed by considering the following:

A hard core is used in the system.

Communications between the processor and the FPGA are considered.

The frequency dividers used for the operation of the hardware modules are also considered.

Remark 1. 

No previous reports were found regarding the resource consumption of a fuzzy PD controller implemented on an FPGA or processor. Therefore, the hardware and software constraints were defined for academic purposes, taking as reference the resources available in a Xilinx Zynq SoC. In particular, the limits were set as follows: memory < 95 kB and area < 2000 LUTs. These constraints are not intended to represent any specific commercial implementation but rather to provide a realistic reference scenario that allows evaluating the behavior of the proposed hardware/software partitioning method.

4.2. Memory Metric and Objective Function

This subsection deals with the application of the methodology presented in Section 3 to our case study, i.e., the obtaining of the minimum memory, taking into account that the software implementation will be carried out through a description in the C language.

4.2.1. Memory Consumption Extraction

In order to analyze the memory sections of a software implementation, the memory map file (with the .map extension), which is generated after building the project, is used. The memory segments listed can be found in this file, together with their addresses and lengths, as a summary of their content and sub-segments as well.

The first step consists of obtaining the minimum requirements of the system in terms of memory occupation for the proper operation of the processor, so a project free of variables or logic operations was built in C with the minimum program as shown in Listing 1.

Listing 1. Minimal C project
                                                                                                           i n t main ( void ) {
                                                                                                                              return   0 ;
                                                                               }

After reviewing the memory mapping generated during the project build process, the segments were classified into two groups: the constant segments shown in Table 2, which include the Heap, the Stack (segments with user-defined lengths), and reserved sections that are typically not modified since they are related to the internal operation of the processor [20].

In the second group shown in Table 3, segments varying according to the implemented algorithm are found. In such segments, variables and machine code to be used by the processor are stored. Since the program contains the most basic structure possible, it is concluded that the shown values for the segments are minimal. Which implies that any increase in these values will be attributed to the implemented algorithm. In addition, considering the memory needed for constant-length segments, a software implementation requires at least 42.504 kB unless the predefined system configurations are modified. Another point to mention is that the variable segments include rodata, which is not one of those predefined by the theory but is specific to the architecture and contains read-only data.

Once the minimum system memory has been determined, each of the modules designed in the Vitis™ 2020.2 software platform is individually built to obtain its corresponding memory usage report. Based on this information, the actual memory consumption of each module is calculated, and the results are presented in Table 4.

4.2.2. Objective Function Construction

The memory objective function derived from Equation (13) takes the following form for the control algorithm:

(20)Memory = i=17(1  xi)Hi + 1  i=17xiHmin + ΔHsync  Hcorrection,

where Hmin represents the minimum memory requirements necessary for system operation, as shown in Table 2 and Table 3. It also includes the memory associated with the printf function, which is always incorporated when the processor is used to display the measured speed values, and the custom function ElapsedTime, since one of the key requirements of the test case is to maintain a consistent time constant across all implementations (fully software or HW/SW). This function ensures that the required time constant is preserved in every configuration. The constant Hsync accounts for the memory required to support type conversion functions, since hardware modules operate with fixed-point representation while the software uses floating-point. Proper conversion is therefore essential. In addition, Hsync includes the C usleep function, which allows the execution to pause briefly, ensuring that communication control flags remain active for the necessary duration and enabling proper HW/SW synchronization. These functions are used exclusively in HW/SW configurations and are not required in fully hardware or fully software implementations. The correction factor proposed for this case is defined by:

(21)Hcorrection = (x4¯x6¯)Hxgpiops + (x4¯ + x6¯)(Hxil_printf + Hudivsi3 + Hxuartps_hw)

In Equation (21), the constants Hxgpiops, Hxil_printf, Hudivsi3, and Hxuartps_hw were obtained from the memory consumption of their homonymous libraries, which were identified as common—and therefore repeated—across the implemented functions. Below, these constants are listed together with a brief description of their corresponding libraries:

Hxgpiops: contains functions for managing the processor’s input/output ports as well as its interrupts.

Hxil_printf: provides a lightweight implementation of the printf function. Although it lacks floating-point support, it is suitable for printing integers or characters.

Hudivsi3: It allows 32-bit unsigned integer division (where ‘usi’ stands for “unsigned short int” in a GCC internal context, referring to a standard integer size, and ‘3’ refers to the number of operands).

Hxuartps_hw: This library provides functions for initializing the UART, sending/receiving data, checking status, and handling interrupts.

Finally, Table 5 presents the values of the constants related to Equations (20) and (21).

4.3. Area Objective Function

Based on Equation (15), and considering the characteristics of the control algorithm together with the use of a hard-core processor in this project, the constant Aμp can be excluded. This term reflects FPGA resources associated with soft-core implementations, which are not relevant here. The resulting expression is:

(22)Area = i=17xi · Ai + Δ · Acom + Aextra.

Here, Acom represents the PL resources required to implement the Xilinx® (San Jose, CA, USA) Intellectual Property (IP) core AXI4-Lite Interface Wrapper, which enables PL–PS communication through the AXI protocol. The term Aextra represents the additional hardware required for generating the system clock signals and is proposed as follows:

(23)Aextra = x4 + x62 · Af d62.5M + x1 + x2 + x3 + x54 · Af d54k,

where Af d62.5M and Af d54k denote the resources required by the frequency dividers that generate the 62.5 MHz and 54 kHz signals, respectively. It should be noted that (22) works for LUTs or FFs. After implementing each module individually, the hardware metrics summarized in Table 6 were obtained.

Regarding hardware multipliers (HM), the formulation follows the same structure as (22). However, since neither the frequency dividers (Af d54k, Af d62.5M) nor Acom consume DSP resources—as shown in Table 6—the expression proposed simplifies to:

(24)HMblocks = i=17xi · HMi.

4.4. Performance Estimation of Modified Objective Functions

To validate the proposed objective functions, area and memory usage are estimated for fully hardware and entirely software implementations. These estimations are then compared with the actual results obtained after implementation in Vivado® 2020.2 and construction in Vitis™, respectively. Table 7 presents the results related to hardware area consumption. Based on these reference values, the relative estimation error was calculated, yielding 2.11% for LUTs, 1.26% for FFs, and 0% for DSPs. These results indicate good accuracy, particularly the 2.11% error for LUT estimation, which compares favorably with the 2.24% LUT area estimation error reported in [19].

Regarding memory usage, Table 8 shows the estimation results. A relative error of 1.16% was obtained, which is considered satisfactory given the complexity involved in estimating memory consumption. Moreover, to the best of the authors’ knowledge, there is a lack of prior work providing comparative data for this metric. While memory usage is briefly addressed in a few works such as [19,21], the handling of this metric is generally not detailed. These works typically present only the objective function (e.g., Equation (7)) without discussing the accuracy or performance of the corresponding estimations.

After obtaining the hardware and software module metrics and validating the performance of the proposed objective functions, the next step is the solution search phase, which is carried out using the NSGA-II multi-objective optimization algorithm.

4.5. Obtaining the Pareto Front Using NSGA-II

To search for solutions that satisfy the imposed constraints, the Pareto front obtained through the NSGA-II algorithm is used, which has shown excellent results in previous works such as [19,22] for addressing HW/SW partitioning problems. Additionally, since NSGA-II is a genetic algorithm, it benefits from a chromosome-based representation, which enables simpler encoding and clearer visualization of solutions.

In the context of genetic algorithms, chromosomes, or individuals within a generation, are composed of n basic units called genes. For our case, n is set to 7, corresponding to the number of system modules, as illustrated in Figure 2. Each gene represents a module to be implemented and is encoded using a binary variable xi. This variable determines the implementation type for the corresponding module: when xi=1, the module Mi is implemented in hardware; when xi=0, the implementation is in software. Therefore, each chromosome represents a specific partitioning solution.

Now, regarding the implementation of NSGA-II, Algorithm 1 shows a compact version of the standard NSGA-II workflow (adapted from [23]). The full algorithm was implemented following the canonical steps, but with two key modifications required for binary chromosome representations. First, the original real-coded chromosome initialization was replaced with a binary initialization procedure (Algorithm 2). Second, the genetic operators were adapted to binary encoding instead of the real-coded SBX operator used in the classical NSGA-II. The modified crossover and mutation operators are described in Algorithm 3.

Algorithm 1 NSGA-II Main Loop (Adapted from [23])
Require: 

Population size N, number of generations G

Ensure: 

Final non-dominated set

1:. P0InitializePopulation(N)                                                              ▹ Uses Algorithm 2

2:. EvaluateObjectives(P0)

3:. for t=1 to G do

4:.       Rt  Pt  Qt

5:.       F  FastNonDominatedSort(Rt)

6:.       Pt+1  

7:.       i  1

8:.       while |Pt+1| + |Fi|  N do

9:.             ComputeCrowdingDistance(Fi)

10:.           Pt+1  Pt+1  Fi

11:.           i  i+1

12:.     end while

13:.     ComputeCrowdingDistance(Fi)

14:.     Pt+1  Pt+1  SelectBest(Fi,N  |Pt+1|)

15:.     Qt+1  GeneticOperators(Pt+1)                                                 ▹ Uses Algorithm 3

16:.     EvaluateObjectives(Qt+1)

17:. end for

        return Final non-dominated solutions in Pt

Algorithm 2 Binary Initialization of Chromosomes (Modified)
Require: 

Population size N, chromosome length L

Ensure: 

Population P

1:. P    

2:. for i = 1 to N do

3:.       Create chromosome Ci

4:.       for j = 1 to L do

5:.             Cij   RandomBit()                                                                         // Uniform {0, 1}

6:.       end for

7:.       P  P  {Ci}

8:. end for

        return  P

Algorithms 2 and 3 correspond to the components modified in this work to support binary encoding. All other steps (ranking, crowding distance, fast non-dominated sorting, and selection) follow the original NSGA-II procedure described in [24]. Finally, the NSGA-II algorithm was implemented in Matlab® with the following configuration parameters: 50 generations, a population size of 30, a crossover rate of 0.9, and a mutation rate of 0.1. The resulting Pareto front obtained after execution is depicted in Figure 3, where the design constraints introduced at the beginning of this section are also illustrated.

Each blue dot in the figure represents an individual. The region below the black line and to the left of the gray line corresponds to the set of individuals that satisfy both constraints. These individuals are identified as solutions. The next step is to select the most suitable solution for implementation, which is guided by analyzing the characteristics of each candidate, as shown in Table 9.

Algorithm 3 Binary Genetic Operators (Modified)
Require: 

Parent population P, crossover rate pc, mutation rate pm

Ensure: 

Offspring population Q

1:. Q

2:. while  |Q| < |P|  do

3:.       Select parents C1, C2 via binary tournament

4:.       if Random() < pc then

5:.             k  RandomInteger(1, L)

6:.             O1  C1[1:k]C2[k + 1:L]

7:.             O2  C2[1:k]C1[k + 1:L]

8:.       else

9:.             O1  C1

10:.           O2  C2

11:.     end if

12:.     for i=1 to L do

13:.           if Random() < pm then

14:.                 O1[i]  1  O1[i]                                                                ▹ Bit-flip mutation

15:.           end if

16:.     end for

17:.     for i=1 to L do

18:.           if Random() < pm then

19:.                 O2[i]  1  O2[i]

20:.           end if

21:.     end for

22:.     Q  Q  {O1,O2}

23:. end while

        return  Q

Based on the analysis of the optimization results, solution S1 is selected as the best candidate. It requires the least hardware area (in terms of LUTs, FFs, and DSPs). Compared to S4, the configuration with the lowest memory usage, S1 achieves an 11.82% reduction in LUTs and a 10.23% reduction in FFs, at the cost of only a 0.28% increase in memory usage. Regarding DSP utilization (HM), both S1 and S2 use 16% fewer DSPs than S3 and S4.

Finally, when comparing S1 and S2, both solutions are similar in terms of metrics; however, S1 requires fewer communications between the Processing System (PS) and the Programmable Logic (PL). This is inferred from the number of transitions (from 0 to 1 or vice versa) in the chromosome configuration, which corresponds to communication interfaces. S1 has four transitions, while S2 has five, making S1 slightly simpler to implement in the final system. Based on these observations, solution S1 is selected for implementation.

5. Results

5.1. Implementation of the Selected Configuration

Selecting the configuration to be implemented, as shown in Table 10, the proposed implementation is outlined through the block diagram in Figure 4. Orange modules represent software-implemented components, while blue modules correspond to hardware-implemented components. A red dashed line indicates the data transfer across the AXI bus, which connects PS and PL. Blue dashed lines indicate the locations where data type conversions occur, either from fixed-point to floating-point or vice versa, performed within the processor. Finally, a black dashed line marks the point where data is captured for output display.

After implementing the design in Vivado® and building the project in Vitis™, the results shown in Table 11 were obtained. Relative errors between the estimated and measured values for LUTs, FFs, DSPs, and memory blocks (in bytes) were calculated as 2.49%, 2.08%, 0%, and 1.59%, respectively.

Overall, the results are satisfactory, considering that each configuration introduces variation in estimation error, as discussed in [19]. When comparing these results with those from fully hardware or fully software implementations, the errors slightly increased. Nevertheless, considering that the aforementioned study reported a maximum hardware area (LUTs) error of 2.24%, the maximum error of 2.49% obtained in this work remains within an acceptable range. Notably, the DSP error remained at 0%.

Regarding memory usage, although the error increased by 0.43% compared to the fully software implementation, it still remains below 2%.

Analysis of the Added Terms in the Objective Functions

After obtaining the results of the partitioned system, and given that the estimations fall within an acceptable error range, we proceed to analyze the contribution of the additional terms introduced in the area (LUTs) and memory objective functions. It is important to clarify that, in all cases, the relative error εr is computed with respect to the measured resource usage reported by Vivado® and Vitis™ for each corresponding implementation scenario (fully hardware, fully software, or HW/SW).

The analysis of the area objective function (15) is performed in two parts. First, Table 12 presents the fully hardware case, where the relative error is shown as the terms of the equation are progressively added. It is worth noting that the term Acom does not appear in this table because, in the fully hardware implementation, no communication with the processor is required; therefore, this term is disabled in this scenario.

Second, Table 13 shows the results for the partitioned HW/SW system, where all terms are enabled. In this case, the relevance of the terms Acom and Aextra becomes more evident. Unlike the fully hardware scenario, using only the O1 term results in a significant relative error of 24%. By incorporating the proposed additional terms, the error is reduced to 2.5%.

The results of the analysis done on (13) for the fully software case are presented in Table 14, where we can observe the progressive reduction of the relative error as each term is added, reaching a final value of 1.2%. In this scenario, the Hsync term is disabled because all processing is executed on the FPGA.

Finally, Table 15 presents the results for the HW/SW scenario. Here, the relative error decreases from 98.8% (when only the O3 term is considered) to 1.6% when the complete proposed memory objective function is applied. It is also worth noting that, in this case, the Hcorrection term becomes zero according to (21), since modules 4 and 6 are not executed in software.

5.2. Comparison of Area and Memory Objective Functions

To validate the proposed objective functions, we focus on estimating the area and memory usage of fully hardware, fully software, and HW/SW implementations. These estimations are obtained using both the objective functions found in the literature and the modified ones proposed in this work. The estimated results are then compared with the actual outcomes obtained after implementing the hardware and building the software.

5.2.1. Comparison of Area Objective Functions

The analysis of the fully hardware and partitioned implementations yields the results summarized in Table 16 and Table 17, respectively. For the fully hardware implementation, the deviation is minimal, with a difference of 0.68%. In contrast, the partitioned implementation exhibits a more noticeable deviation, with a difference of 22.05%. This difference can be attributed to the additional hardware required for communication. By considering this extra hardware in the proposed approach, a more accurate estimation is achieved.

5.2.2. Analysis of Memory Objective Function

The comparison of memory objective functions is more complex than in the case of area. As discussed in this work, the compiler reports a memory consumption value that accounts for both the memory required by the implemented module and the memory required by the system to operate. The objective function found in the literature does not consider the system memory required for operation (constant segment), whereas the proposed objective function explicitly includes this component. Consequently, a direct comparison between both functions is not feasible.

To enable this comparison, the constant segment is excluded from the Hmin constant in the proposed formulation. The estimation is performed using the objective function presented in [19] (Equation (7)). For a fairer comparison, the value obtained using the objective function (7) is adjusted by adding the memory consumption associated with the printf function, since this function is not part of the partitioned modules and is only used for result visualization. The results for the fully software implementation are presented in Table 18, while those for the HW/SW implementation are shown in Table 19.

From the analysis of the obtained results, it can be observed that the proposed objective function achieves the best performance among the evaluated alternatives. The relative error remains below 5%, demonstrating the improved accuracy of the proposed formulation. This improvement is mainly attributed to the inclusion of memory consumption components that, although not directly related to the functional modules, are essential for the correct operation of the system.

5.3. Experimental Results

The experimental results with constant, step-wise and sinusoidal reference signals illustrate the main benefits of the PD-type fuzzy controller in this application. The controller achieves fast convergence and small steady-state errors (below 2%) over a wide operating range, while preserving satisfactory tracking of bidirectional speed commands and sinusoidal trajectories. This behavior is consistent with reported advantages of fuzzy PD/PID controllers over conventional PD/PID schemes in terms of tracking accuracy, disturbance rejection, and robustness in bridge cranes, lane-keeping systems, CNC servo drives, overhead-crane systems, and underwater vehicles [14,15,17,25,26]. In our case, these performance features are obtained with a controller whose memory and hardware requirements remain moderate, which is adequate for the resource bounds adopted in the HW/SW partitioning case study described in Remark 1.

To carry out the experimental phase, a measurement setup is required that provides all the necessary resources for the generation, acquisition, and processing of the relevant signals. The proposed setup consists of five main components. First, a Digilent Zybo development board is used, which features a Zynq™ XC7Z010-1CLG400C SoC that integrates a dual-core Cortex®-A9 processor and an Artix™7 FPGA. The plant in the control system is represented by a 12 V permanent magnet DC motor equipped with an optical encoder. To control the direction of the motor and supply the required power, an H-bridge module (version 1.02) from EncoderGeek is employed, which incorporates two LMD18200T amplifiers. A bidirectional TTL logic level converter (OKY3460) is included to adapt the voltage levels between the Zybo board and the encoder circuit. Finally, a personal computer is used to collect, log, and display the output data of the controller. The data collected from the experimental setup is subsequently processed using Matlab® scripts for both quantitative and qualitative analysis. Figure 5 shows the assembled prototype, with each of the aforementioned components labeled accordingly and the system while monitoring the pulse train generated by the encoder. The partitioned control system is designed in Verilog for the hardware implementation and in C for the software component.

Control System Tests

In this section, the results obtained from feeding the control system with different reference signals, constant, step-wise, and sinusoidal are presented.

Three supply voltages were required to carry out the experiments. A 13 V source was used to power the H-bridge, as recommended by the manufacturer, since an internal voltage drop of approximately 1 V is expected, thus allowing the motor to receive close to 12 V. Additionally, a 5 V supply was required for both the circuit that generates the signals in the encoder and the logic level converter. Lastly, a 3.3 V supply was also needed for the logic level converter, given its bidirectional operation with components of different voltage domains.

The control tests were carried out using the following gain configuration: Gp = 1 for the proportional gain, Gd = 0.667 for the derivative gain, and Gs = 1 for the output gain of the fuzzy controller.

Constant Reference

The first experiment evaluates the performance of the system under a constant reference signal, in this case, 300 rad/s. Performance is assessed using the settling time (Ts) within 2%, which is defined as the time required for the response to reach and remain within 2% of its final value [27], and the maximum relative error in the steady-state region (Emax (%)) between the measured speed and the reference. Figure 6 shows the closed-loop system response, from which a settling time of 253 ms and a maximum steady-state error of 0.393% were obtained.

Step-Wise Reference

This experiment aims to verify both the system’s repeatability and the proper operation of the motor in both rotational directions. Since module M1, which manages the reference signal, was implemented in software, an array of predefined reference values was included, and an additional processor timer was activated to update the reference at regular intervals. In this case, the reference changes every 5 s.

Figure 7 shows the response of the system to the step-wise reference input, demonstrating correct tracking for both positive and negative reference values. Regarding performance, Table 20 presents the maximum steady-state percentage error (Emax (%)) for each reference level. As can be seen, all errors remain below 2%.

Sinusoidal Reference

For this experiment, a sinusoidal reference signal with a frequency of 0.5 rad/s was used, and a sampling time of 30 ms was configured. Figure 8 shows the response of the system to the sinusoidal reference input. As observed, the motor is able to follow the reference in both rotational directions, exhibiting its largest deviation near the zero-crossings. Nevertheless, the system recovers quickly and continues to track the reference signal effectively.

6. Conclusions

This paper presented an HW/SW partitioning methodology that incorporates a novel memory-aware metric into the decision-making process. Unlike conventional approaches that focus exclusively on area, latency, or power, this work emphasizes the importance of memory usage, both static and dynamic, as a primary design concern in embedded system development. By integrating the memory metric into a multi-objective optimization framework using NSGA-II, a more comprehensive exploration of the design space was achieved. The resulting Pareto front included configurations that offered a favorable trade-off between memory consumption and hardware resource utilization. Experimental validation using a PD-type fuzzy controller for a DC motor confirmed that the selected HW/SW configuration met the desired functional requirements. The memory usage estimation model demonstrated a prediction accuracy of over 97.5%, reinforcing its value for early-stage design exploration. Overall, the proposed memory-aware partitioning approach represents a meaningful advancement in embedded system co-design, especially for applications where memory resources are constrained or critical to system reliability.

Author Contributions

Conceptualization, D.H.G.R. and J.R.; methodology, D.H.G.R.; validation, D.H.G.R.; data curation, D.H.G.R.; writing—original draft preparation, D.H.G.R. and J.R.; writing—review and editing, D.H.G.R. and J.R.; supervision, J.R. and S.O.-C.; project administration, J.R. and S.O.-C. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Figures and Tables

Figure 2 Chromosome representation.

View Image -

Figure 3 Pareto front with constraints. the area constraint (Area < 2,000 LUTs) is illustrated by a dark gray dashed vertical line, and the memory constraint (Memory < 95 KB) is marked by a black dashed horizontal line.

View Image -

Figure 4 Block diagram of the partitioned control system.

View Image -

Figure 5 Control system prototype. (a) Hardware components: (1) H-bridge, (2) logic-level converter, (3) Zybo board, (4) encoder system consisting of a logic board and a 42-slot slotted disk, and (5) DC motor. (b) System under test; the pulse train generated by the encoder is visible in the image.

View Image -

Figure 6 Closed-loop system response to a constant reference input. The dashed line represents the reference signal for the motor speed, and the solid line corresponds to the measured speed.

View Image -

Figure 7 Closed -loop system response to a step-wise reference input. The dashed line represents the motor speed reference, and the solid line represents the measured speed.

View Image -

Figure 8 Closed -loop system response to a sinusoidal reference input. The dashed line represents the motor speed reference, and the solid line represents the measured speed.

View Image -

Comparison between representative works and the proposed methodology.

Work Partitioning Approach Memory Considered Memory Analysis Level Key Differences w.r.t. This Work
Zaharia et al. (2023) [1] Profiling-based HW/SW partitioning Implicit (bandwidth limits) System-level No module-level metric extraction; not multi-objective with memory
Iguider et al. (2020, 2020b) [5,6] Hybrid and game-theory heuristics No N/A Metrics limited to time/area; memory not integrated
Cheng et al. (2023) [7] Immune algorithm No N/A No memory metric, no extraction from SW mapping
Pouget et al. (2025) [10] Holistic FPGA optimization Yes (data movement) Architecture-level No C-level extraction; not a HW/SW partitioning framework
This work NSGA-II multi-objective HW/SW partitioning Yes Module-level Detailed memory extraction, correction factors, SoC-aware cost model; integrated into multi-objective optimization

Constant segment length in bytes.

Heap Stack Mmu_Tbl Init Finit Eh_Frame Init_Array Finit_Array Total
8192 14,336 16,384 12 12 4 4 4 38,948

Minimum occupancy of variable segments in bytes.

Text Rodata Data BSS Total
2376 4 1136 40 3556

Software metric results.

Function Hi(B)
M1 : U1 116
M2 : U2 772
M3 : U3 3732
M4 : U4 13,649
M5 : U5 144
M6 : PWM 10,516
M7 : Filter 64

Software-related constants in bytes.

Constant B
H m i n 80,916
H s y n c 10,756
H x g p i o p s 6856
H x i l _ p r i n t f 1684
H u d i v s i 3 663
H x u a r t p s _ h w 191

Hardware metrics.

Hardware Components Ai(LUTs) Ai(FFs) HMi (DSPs)
M1 : U1 19 38 0
M2 : U2 700 241 12
M3 : U3 605 385 7
M4 : U4 612 259 14
M5 : U5 175 87 4
M6 : PWM 62 19 0
M7 : Filter 268 49 2
A c o m 345 542 0
A f   d 54 k 15 13 0
A f   d 62.5 M 2 3 0

Hardware area consumption results.

Resource Ai (LUTs) Ai (FFs) MHi (DSPs)
Estimated 2458 1094 39
Measured 2511 1108 39

Memory consumption results.

Metric Value
Estimated (B) 97,977
Measured (B) 96,856
Relative Error (%) 1.16

Optimization solutions.

Configuration (x1 … x7) A (LUTs) A (FFs) HM (DSPs) MB (B)
S1 0011010 1641 1221 21 92,768
S2 1011010 1660 1259 21 92,652
S3 0011110 1816 1308 25 92,624
S4 1011110 1835 1346 25 92,508

Final hardware/software control system configuration.

Module Hardware Software
M1:U1 x
M2:U2 x
M3:U3 x
M4:U4 x
M5:U5 x
M6:PWM x
M7:Filter x

Comparison between estimated and measured results.

A (LUTs) A (FFs) HM (DSPs) MB (Bytes)
Estimated 1641 1221 21 92,768
Measured 1683 1247 21 94,276

Area objective function breakdown: fully hardware case.

Term Area Estimation (LUTs) ε r ( % )
O 1 2441 2.7
+ A e x t r a 2458 2.1

Area objective function breakdown: HW/SW case.

Term Area Estimation (LUTs) ε r ( % )
O 1 1279 24
+ A c o m 1624 3.5
+ A e x t r a 1641 2.5

Memory objective function breakdown: fully software case.

Term Memory Estimation (Bytes) εr (%)
O 3 28,993 70
+ H m i n 109,909 −13.5
H c o r r e c t i o n 97,977 −1.2

Memory objective function breakdown: HW/SW case.

Term Memory Estimation (Bytes) εr (%)
O 3 1096 98.8
+ H m i n 82,012 13
+ H s y n c 92,768 1.6

Full hardware case results.

Estimated Resources A (LUTs) εr (%)
Literature 2441 2.78
Own 2458 2.11
Measured 2511 N/A

Partitioned case results with respect to area.

Estimated Resources A (LUTs) εr (%)
Literature 1279 22.05
Own 1641 2.49
Measured 1683 N/A

Full software case results.

Estimated Resources MB (kB) εr (%)
Literature 34.272 36.944
Literature plus printf 62.592 −15.16
Own 55.473 −2.062
Measured 54.352 N/A

Results of partitioned case with respect to memory.

Estimated Resources MB (kB) εr (%)
Literature 1.096 97.883
Literature plus printf 29.416 43.182
Own 50.264 2.913
Measured 51.772 N/A

Maximum steady-state percentage errors for step-wise references.

Reference (rad/s) Emax (%)
300 0.393
150 1.100
−150 0.448
−300 0.258
200 0.486

References

1. Zaharia, C.; Popescu, V.; Sandu, F. Hardware—Software Partitioning for Real-Time Object Detection Using Dynamic Parameter Optimization. Sensors; 2023; 23, 4894. [DOI: https://dx.doi.org/10.3390/s23104894] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/37430806]

2. Niemann, R.; Marwedel, P. Hardware/Software Partitioning Using Integer Programming. Proceedings of the ED&TC European Design and Test Conference; Paris, France, 11–14 March 1996; IEEE: New York, NY, USA, 1996; pp. 473-479. [DOI: https://dx.doi.org/10.1109/EDTC.1996.494343]

3. Mann, Z.Á.; Orbán, A.; Arató, P. Finding Optimal Hardware/Software Partitions. Form. Methods Syst. Des.; 2007; 31, pp. 241-263. [DOI: https://dx.doi.org/10.1007/s10703-007-0039-0]

4. Zhai, Q.; He, Y.; Wang, G.; Hao, X. A General Approach to Solving Hardware and Software Partitioning Problem Based on Evolutionary Algorithms. Adv. Eng. Softw.; 2021; 159, 102998. [DOI: https://dx.doi.org/10.1016/j.advengsoft.2021.102998]

5. Iguider, A.; Bousselam, K.; Elissati, O.; Chami, M.; En-Nouaary, A. Heuristic Algorithms for Multi-Criteria Hardware/Software Partitioning in Embedded Systems Codesign. Comput. Electr. Eng.; 2020; 84, 106610. [DOI: https://dx.doi.org/10.1016/j.compeleceng.2020.106610]

6. Iguider, A.; Bousselam, K.; Elissati, O.; Chami, M.; En-Nouaary, A. Embedded Systems Hardware Software Partitioning Approach Based on Game Theory. Innovations in Smart Cities Applications Edition 3; Ben Ahmed, M.; Boudhir, A.A.; Santos, D.; El Aroussi, M.; Karas, İ.R. Springer International Publishing: Cham, Switzerland, 2020; pp. 542-555. [DOI: https://dx.doi.org/10.1007/978-3-030-37629-1]

7. Cheng, P. Hardware and Software Partitioning Method of Embedded System Based on Immune Algorithm. Proceedings of the 2023 International Conference on Applied Intelligence and Sustainable Computing (ICAISC); Dharwad, India, 16–17 June 2023; IEEE: New York, NY, USA, 2023; pp. 1-6. [DOI: https://dx.doi.org/10.1109/ICAISC58445.2023.10199287]

8. Zhang, T.; Liu, G.; Yue, Q.; Zhao, X.; Hu, M. Using Firework Algorithm for Multi-Objective Hardware/Software Partitioning. IEEE Access; 2019; 7, pp. 3712-3721. [DOI: https://dx.doi.org/10.1109/ACCESS.2018.2886430]

9. Taji, H.; Miranda, J.; Peón-Quirós, M.; Atienza, D. MEDEA: A Design-Time Multi-Objective Manager for Energy-Efficient DNN Inference on Heterogeneous Ultra-Low Power Platforms. arXiv; 2025; [DOI: https://dx.doi.org/10.48550/arXiv.2506.19067]

10. Pouget, S.; Lo, M.; Pouchet, L.-N.; Cong, J. Holistic Optimization Framework for FPGA Accelerators. Acm Trans. Des. Autom. Electron. Syst.; 2025; 31, 7. [DOI: https://dx.doi.org/10.1145/3769307]

11. Liu, D.; Pan, D.; Xiong, X.; Shang, J.; Yin, S. PMP: Pattern Morphing-based Memory Partitioning in High-Level Synthesis. Proceedings of the 61st ACM/IEEE Design Automation Conference (DAC); San Francisco, CA, USA, 7 November 2024; pp. 1-6. [DOI: https://dx.doi.org/10.1145/3649329.3658239]

12. Odema, M. Hardware/Software Co-Design Methodologies for Efficient AI Systems and Applications. Ph.D. Thesis; UC Irvine: Irvine, CA, USA, 2024; Available online: https://escholarship.org/uc/item/5qh4b7q3 (accessed on 28 November 2025).

13. Gaytán Rivas, D.H.; Domínguez, J.R.; Cisneros, S.O.; Muñoz Zapata, H.E.; Baungarten-Leon, E.I. On the Novel Design and FPGA Implementation of a Fuzzy PD Control for a DC Motor. Proceedings of the 21st International Conference on Electrical Engineering, Computing Science and Automatic Control (CCE); Mexico City, Mexico, 23–25 October 2024; IEEE: New York, NY, USA, 2024; pp. 1-6. [DOI: https://dx.doi.org/10.1109/CCE62852.2024.10770929]

14. Zhang, Y.; Liu, L.; He, D. Application of Variable Universe Fuzzy PID Controller Based on ISSA in Bridge Crane Control. Electronics; 2024; 13, 3534. [DOI: https://dx.doi.org/10.3390/electronics13173534]

15. Samuel, M.; Yahya, K.; Attar, H.; Amer, A.; Mohamed, M.; Badmos, T.A. Evaluating the Performance of Fuzzy-PID Control for Lane Recognition and Lane-Keeping in Vehicle Simulations. Electronics; 2023; 12, 724. [DOI: https://dx.doi.org/10.3390/electronics12030724]

16. Carpio, M.; Saltaren, R.; Viola, J.; Calderon, C.; Guerra, J. Proposal of a Decoupled Structure of Fuzzy-PID Controllers Applied to the Position Control in a Planar CDPR. Electronics; 2021; 10, 745. [DOI: https://dx.doi.org/10.3390/electronics10060745]

17. Yu, Z.; Liu, N.; Wang, K.; Sun, X.; Sheng, X. Design of Fuzzy PID Controller Based on Sparse Fuzzy Rule Base for CNC Machine Tools. Machines; 2023; 11, 81. [DOI: https://dx.doi.org/10.3390/machines11010081]

18. Mhadhbi, I.; Othman, S.B.; Saoud, S.B. Hardware/Software Partitioning Heuristics Approaches. Proceedings of the 2017 International Conference on Advanced Systems and Electric Technologies (IC_ASET); Hammamet, Tunisia, 14–17 January 2017; IEEE: New York, NY, USA, 2017; pp. 164-169. [DOI: https://dx.doi.org/10.1109/ASET.2017.7983684]

19. Bahri, I.; Idkhajine, L.; Monmasson, E.; Benkhelifa, M.E.A. Hardware/Software Codesign Guidelines for System on Chip FPGA-Based Sensorless AC Drive Applications. IEEE Trans. Ind. Inform.; 2013; 9, pp. 2165-2176. [DOI: https://dx.doi.org/10.1109/TII.2013.2245908]

20. AMD. Vitis Unified Software Platform Documentation: Embedded Software Development (UG1400). 2023; Available online: https://docs.xilinx.com/r/en-US/ug1400-vitis-embedded/Object-File-Sections (accessed on 11 October 2025).

21. Pando, H.D. Model and Partitioning Strategies for Hardware/Software Components in Embedded Systems Co-Design. Ph.D. Thesis; Universidad de Alicante: Alicante, Spain, 2014; (In Spanish)

22. Bhuvaneswari, M.C. Application of Evolutionary Algorithms for Multi-Objective Optimization in VLSI and Embedded Systems; Springer: New Delhi, India, 2015.

23. Seshadri, A. “NSGA-II: A Multi-Objective Optimization Algorithm,” MATLAB Central File Exchange, 2025. Available online: https://la.mathworks.com/matlabcentral/fileexchange/10429-nsga-ii-a-multi-objective-optimization-algorithm (accessed on 28 November 2025).

24. Deb, K.; Pratap, A.; Agarwal, S.; Meyarivan, T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput.; 2002; 6, pp. 182-197. [DOI: https://dx.doi.org/10.1109/4235.996017]

25. Sun, Z.; Ling, Y.; Tan, X.; Zhou, Y.; Sun, Z. Designing and Application of Type-2 Fuzzy PID Control for Overhead Crane Systems. Int. J. Intell. Robot. Appl.; 2021; 5, pp. 10-22. [DOI: https://dx.doi.org/10.1007/s41315-020-00157-w]

26. Fan, S.; Wang, H.; Zuo, C.; Han, J. Fuzzy Adaptive PID-Based Tracking Control for Autonomous Underwater Vehicles. Actuators; 2025; 14, 470. [DOI: https://dx.doi.org/10.3390/act14100470]

27. Nise, N.S. Sistemas de Control para Ingeniería; Grupo Editorial Patria: Ciudad de México, Mexico, 2007.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.