Content area
Nowadays, multi-core processors are increasingly adopted in embedded systems. These processors can achieve energy consumption minimization by employing dynamic voltage/frequency scaling techniques (DVFS). Several energy-aware real-time task partitioning algorithms have been suggested for multicore processors. While many of these algorithms focus on independent real-time tasks, there has been relatively limited research dedicated to task synchronization. This paper focuses on optimizing energy consumption by assigning dependent real-time tasks to a multi-core processor. When multiple tasks on different cores access shared resources simultaneously, it can result in longer blocking times, consequently increasing the execution time of tasks. This situation can result in missing hard deadlines, potentially causing system failure. The Highest Task-Based Partitioning (HTBP) algorithm is structured to decrease overall energy consumption while ensuring deadlines are met. It allocates tasks with high similarity (accessing the same set of resources) to the same core, effectively minimizing the occurrence of remote blockings. In the evaluation of the HTBP algorithm, we compared it with similarity-based partitioning (SBP), worst-fit decreasing (WFD) and best-fit decreasing (BFD). Our results indicate that our proposed (HTBP) algorithm outperforms SBP, WFD, and BFD algorithms (bin-packing algorithms), minimizes the overall energy dissipation, and improves schedulability.
Introduction
Nowadays, Computer architecture has significantly transformed with the widespread acceptance of multicore processors. A multicore processor is a crucial element within modern computing systems. It represents a departure from the traditional uniprocessor model by integrating multiple processing units on a single chip. Designers of embedded systems are transitioning their current designs to multi-core architecture to fulfill the increased performance demands of various industries. However, as demands for processing power, complexity, and connectivity continue to surge, integrating multicore processors into real-time embedded systems presents compelling opportunities and daunting challenges.
In modern real-time embedded systems, multicore processors are increasingly used due to their ability to enhance performance and efficiency. They offer improved response times for handling complex programs, faster processing speeds, and enhanced power consumption. They are specifically crafted to execute multiple tasks simultaneously, ensuring enhanced performance and enabling real-time embedded systems to handle complex tasks efficiently, quickly, and with minimal energy. Energy optimization is essential for optimizing multicore processors’ performance, power efficiency, and environmental sustainability across various computing platforms and applications. Multicore processors often consume significant amounts of power, especially under heavy workloads. Energy minimization techniques help reduce power consumption, making the processor more power-efficient. This is crucial for mobile devices where battery life is a primary concern, data centers aiming to reduce their energy bills and environmental impact, and real-time embedded systems to handle complex programs efficiently.
Dynamic voltage and frequency scaling (DVFS) is a power management technique used in modern processors to dynamically adjust the voltage and frequency according to the workload and performance requirements. When a processing core is underutilized, an effective method for conserving energy involves decreasing the task processing speeds, thereby reducing overall energy consumption. Many multicore processors are deployed with the DVFS technique [1]. DVS processors are divided into two categories: ideal and non-ideal. A processor that can operate at any speed, spanning from the slowest to the fastest is referred to as an ideal DVS processor. On the other hand, a processor that can operate only at distinct speeds is referred to as a non-ideal DVS processor. Recently, various strategies have been introduced for the energy-efficient scheduling of real-time tasks on multicore platforms with DVFS techniques [2, 3].
The general approaches for scheduling real-time applications onto multicore platforms include partitioned, semi-partitioned, and global scheduling [4]. In partitioned scheduling, the assignment of tasks to multiple cores is fixed in advance. Each core operates independently with its own scheduler and task queue, with no task migration permitted. Every on-chip core has the option to employ its own Real-Time Operating System (RTOS). This feature enhances its appeal for hard real-time applications [5]. In semi-partitioned scheduling, some tasks remain fixed on specific cores, whereas others can migrate between cores. Global scheduling employs one scheduler and a shared task ready-queue to task scheduling across all ready cores, with the flexibility for tasks to migrate between cores.
Several task partitioning algorithms presume task independence and overlook issues such as blocking due to precedence dependencies, shared-resource access, etc. in real applications [6]. The task partitioning problem is classified as NP-hard [7]. Traditional bin packing algorithms often ignore the blocking time incurred by accessing shared resources, potentially leading to significant blocking of tasks. Locking protocols are employed to prohibit unregulated priority inversions and ensure data coherence when tasks access shared resources [8, 9]. Resource-aware partitioning techniques are necessary for fully utilizing the complete potential of embedded multi-core systems.
This study presents an energy-aware partitioning technique to assign dependent real-time tasks on a multicore platform to reduce dynamic power consumption. The proposed algorithm allocates tasks with high similarity (accessing the same set of resources) onto identical cores, aiming to minimize the number of global shared resources. The scheduling of real-time tasks on individual processing cores within a multicore processor is achieved through the implementation of Partitioned Earliest Deadline First (P-EDF) [10]. To ensure the synchronized access of tasks to shared resources, the Multiprocessor Stack Resource Policy [11] is employed. The researchers evaluated their suggested approach by using the multi-core real-time scheduling simulator (MCRTsim) [12] as a simulation tool, to encourage their results.
The following sections of this paper are organized as follows. "Related work" section offers a summary of past studies concerning DVFS scheduling, allocation, and synchronization within real-time systems, covering both single-processor and multi-processor architectures. The defined system models and problem definition are presented in "System model and problem definition" section. The HTBP algorithm, the schedulability analysis of MSRP, and the Dual-Speed (DS) algorithm are proposed in " Proposed approach" section. The simulation assessment and performance evaluation are reported in "Experimental evaluation and analysis" section. In "Conclusions and future work" section, the paper and outlines future research directions are concluded.
Related work
In partitioned scheduling, after tasks that access shared resources are assigned to processors, the synchronization protocol and processor speed are specified. The synchronization protocol is used to handle shared resources and prevent priority inversions that can affect schedulability, like MPCP, MSRP, and FMLP. Speeds for each processor are calculated based on the schedulability test results. A schedulability test is developed based on a synchronization protocol used to ensure that all tasks can meet their deadlines under the chosen scheduling algorithm (EDF, RM) and task-to-processor mapping.
Synchronization protocols
Resource management strategies for single-processor systems are extensively studied in [13, 14–15]. In [13], the Priority Ceiling Protocol (PCP) was proposed, representing a highly appealing protocol for synchronizing resource access. This protocol eliminates both transitive blocking and deadlock. The Stack Resource Policy (SRP), introduced in [14, 15], is a modification of the PCP designed to address priority inversion problems and facilitate straightforward schedulability tests. In SRP, all tasks are given preemption levels that are inversely proportional to their relative deadlines. This means that tasks with higher preemption levels have shorter deadlines.
Some researchers extended PCP to multiprocessor and distributed environments in [16, 17, 18–19]. making it particularly suitable for distributed shared memory systems. Several versions of the Multiprocessor Priority Ceiling Protocol (MPCP) have been devised in [18, 19] to adapt PCP for use in multiprocessor environments and reduce instances of remote blocking. The authors in [11] expanded SRP to support multiprocessor environments by introducing the Multiprocessor Stack Resource Policy (MSRP) as the initial spin-lock protocol for real-time systems. An experimental assessment was presented in [20] to compare the performance of (MPCP and MSRP) algorithms. The Flexible Multiprocessor Locking Protocol (FMLP), a synchronization protocol designed for multiprocessor systems, is introduced in [21]. FMLP is versatile and can be applied to both partitioned and global scheduling algorithms. The MSRP-FT protocol is the first multiprocessor resource sharing protocol that supports fault-tolerance on mixed-criticality systems (MCS) [22].
Resource-aware partitioning on multiprocessors
The assignment of real-time workloads to multiprocessor systems is similar to solving bin-packing problems [23]. The author in [24] introduced a partitioning method for independent tasks that is nearly optimal, along with a polynomial-time approximation scheme. Generic bin-packing heuristics may not be efficient for workloads with shared resources, as they fail to consider the blocking time resulting. The synchronization-aware task allocation algorithm for multiprocessor real-time systems was introduced in [25] as a partitioning heuristic customized for the MPCP, considering blocking time. The BPA algorithm presented in [26] also uses the MPCP protocol. It clusters tasks that share the same resources to be allocated on a single processor.
The article [27] proposed the Inter-task Affinity-aware Task Allocation (IATA) algorithm that groups tasks depending on their preferences, dependencies, and constraints and allocates these groups to various cores to reduce WCET overheads. The authors in [28] proposed a novel scheduling algorithm called Resource-Oriented Partitioned (ROP), which employs a distributed resource-sharing policy based on the Distributed Priority Ceiling Protocol (DPCP), which can guarantee a significant speedup factor, indicating a substantial performance improvement. The article [29] proposed two strategies for efficiently allocating dependent real-time tasks protected by spin locks, Integer Linear Programming (ILP) formulation, and a novel resource-aware partitioning heuristic. Firstly, despite the computational expense, the ILP formulation is highly effective as it ensures finding a valid allocation if one exists. Secondly, the resource-aware partitioning heuristic is proposed. While it efficiently handles large problem instances, it lacks optimality. The RA-CU-WFD algorithm [30] is proposed as a feasible task-to-processor allocation technique on mixed-criticality multiprocessors to minimize the global waiting time, thereby improving the system’s schedulability.
The energy-aware scheduling and synchronization on a uniprocessor system
Some previous studies have concentrated on optimizing the energy efficiency of task scheduling on a uniprocessor for real-time systems where tasks are dependent [31, 32, 33, 34, 35, 36–37]. When real-time tasks are dependent, it is necessary to access shared resources sequentially to guarantee mutual exclusivity. Some methods primarily concentrate on non-preemptible critical sections for dependent tasks [31, 32, 33–34]. The dual-speed algorithm [31] introduces two different execution speeds for tasks, starting with a slower speed and switching to a faster speed when tasks face blocking. The authors of [32] extended the dual-speed concept to a multiple-speed approach, where each blocked task is assigned a distinct speed, and applied it to non-preemptible critical sections. The researchers proposed the frequency inheritance concept in [35]allowing a blocking task to inherit the core frequency of a blocked task, and applying it to non-preemptible critical sections. The author in [36] assumed that tasks are dependent because of simultaneous access to shared resources and preemptive in critical and non-critical sections. A dynamic mixed task scheduling (DMTS) algorithm [37] is designed for scheduling soft aperiodic and hard periodic tasks that share resources in real-time systems. The authors considered two opposing goals: decreasing aperiodic task response time and minimizing power consumption.
Energy-aware scheduling and synchronization on multiprocessors
For independent tasks, Numerous studies have been conducted on energy-aware scheduling for real-time multiprocessor systems [38, 39, 40, 41, 42, 43, 44, 45, 46–47]. In addition, many works have been introduced on energy and temperature-aware scheduling on heterogeneous and FinFET-based multicores [48, 49, 50, 51, 52–53].
For dependent tasks, because of simultaneous access to shared resources, many researchers have concentrated on energy-aware task scheduling and synchronization due to the necessity for energy optimization in real-time multicore platforms and the complex trade-off between energy usage and managing runtime blocking. The Triple Speed (TS) algorithm [54] offers a method to enhance the energy efficiency of real-time synchronization protocols for multiprocessor systems. It can minimize run-time blocking time and overall energy consumption. The Similarity-Based Partitioning (SBP) algorithm [55] allocates tasks with common resources to a single core aiming to minimize the number of blockings. Additionally, the authors suggested speed assignment techniques to optimize task execution using both per-core and full chip DVFS methods. The Blocking-Aware-Based Partitioning (BABP) algorithm [56] is proposed to efficiently utilize the available parallelism, allocate tasks capable of parallel execution to different cores whenever feasible, and achieve balanced workload distribution in multi-core systems.
The FPMPSA algorithm is proposed for fixed-priority periodic real-time task scheduling on a standby-sparing system with shared resources [57]. It considers the overheads associated with processor state and frequency transitions caused by DPM and DVFS. In addition, DFPMPSA, an extension of FPMPSA, is proposed for enhancing energy efficiency by utilizing dynamic slack time to minimize energy consumption. The IMCPA algorithm is proposed for fixed-priority task scheduling on the (IMC) system in [58]. It improved the schedulability ratio and minimized energy consumption compared to other algorithms. Table 1 summarizes the literature on task scheduling with shared resources.
Table 1. Summary of literature on task scheduling with shared resources
Ref | Algorithm | Synchronization protocol | Optimization Goal |
|---|---|---|---|
[25] | Synchronization-aware task allocation algorithm | MPCP | Number of processors |
[26] | Blocking-aware Partitioning Algorithm (BPA) | MPCP | Schedulable systems & Number of processors |
[27] | Inter-task Affinity-aware Task Allocation (IATA) | MPCP&MSRP | Task schedulability |
[28] | Resource-Oriented Partitioned (ROP) | DPC | Schedulable ratio |
[29] | - ILP-based partitioning scheme - Greedy slacker | - MSRP - Works with any protocol | Schedulability |
[30] | Resource-Aware Criticality-Unaware Worst-Fit Decreasing (RA-CU-WFD) | MSRP | Schedulability ratio |
[36] | Blocking-Aware Two-Speed (BATS) | SRP | Energy consumption |
[37] | Dynamic Mixed Task Scheduling (DMTS) | SRP | Energy consumption |
[54] | Triple Speed (TS) | MSRP | Energy consumption |
[55] | Similarity-Based Partitioning (SBP) | MSRP | Energy consumption |
[56] | Blocking-Aware-Based Partitioning (BABP) | MSRP | Energy consumption |
[57] | Dynamic FPMPSA (DFPMPSA) | RM/DPP | Energy consumption |
[58] | Imprecise Mixed-Criticality Partitioning Algorithm (IMCPA) | MPCP | Energy consumption Schedulability ratio |
Ours | Highest Task-Based Partitioning (HTBP) | MSRP | Energy consumption Deadline miss |
System model and problem definition
DVS processor model
Numerous modern processors, like the Qualcomm Snapdragon 800, AMD A8 6410, and Intel Core M 5Y70, can adjust their voltage and frequency levels. This allows these processors to perform dynamic voltage scaling and adjust their speed under the supplied voltage. There are two categories of DVS processors. Presently, the majority of DVS processors fall into the non-ideal category, whereas the ideal category is primarily used for theoretical analysis.
we consider a non-ideal DVS multicore processor consisting of cores , which has a set of different speeds , where . We assume that the multicore processor has per-core DVS capabilities where every core can operate at discreet speeds during runtime.
Processor power model
The power model employed in this research has been adopted in [55]. Complementary metal oxide semiconductor (CMOS) circuits [59] consume two types of power: static and dynamic. The static power () arises from leakage currents ( passing through transistors when they are in the off state, can be expressed as:
1
The dynamic power is the power dissipated by the CMOS circuit during computational tasks. It relies on the frequency and voltage of the processor at a specific speed , and can be defined as:
2
Where is the effective switching capacitance, is the processor clock frequency (speed), and is the supply voltage. The processor clock frequency is defined as:
3
where is a constant and is the threshold voltage.
As the supply voltage is the primary contributor to both dynamic and static power exhaustion, reducing this supply voltage can result in decreased energy exhaustion.
Task model
This paper considers a set of periodic tasks within a real-time system . We assume that Tasks are dependent (due to shared resources accessing), preemptible (only in non-critical sections), and periodic. A task is described by a tuple where represents the arrival time, denotes the worst-case computation time, signifies the relative deadline, represents the period, and represents the critical sections list of task . Where task satisfies these conditions: and task period equals its relative deadline .
Each task represents a template for its instance. every instance consistently arriving at each period (). At a time , the initial request of is received. The instance of task is denoted by . We can calculate the arrival time of by . A task instance needs to finish executing before its specific deadline which is calculated by . Note that, is the worst-case computation time of when the processor is running at its highest speed . But at a speed , the worst-case computation time of becomes .
Resource model
This paper focuses on partitioning, scheduling, and synchronizing real-time tasks that share resources. A set of shared resources such as data objects, files, or shared variables that can be accessed by tasks in the system . Semaphores are employed to control the access of shared resources, ensuring mutual exclusive access amongst competing tasks. The list defines critical sections of , where denotes the critical section of . A critical section is a code part designed to access a shared resource while ensuring exclusive access. Before a task enters a critical section, it must hold the relevant semaphore and then release it after completing access to the critical section, this can ensure data consistency. Additionally, we presume that critical sections are correctly nested. Specifically, locks are unlocked in the opposite order from which they were obtained. Every shared resource can be given a ceiling priority , representing the maximum priority it can obtain.
Problem formulation
We have the following considerations:
: is a task set of periodic and dependent real-time tasks.
: is a set of shared resources.
A multi-core processor supports discrete speeds with the DVFS technique.
The challenge is scheduling task set and synchronizing its access to resource set on a multicore platform supported by the DVFS technique with discrete speeds. The hyperperiod (lcm) is the least common multiple for all periods of tasks. it is sufficient to evaluate the schedulability and performance of the entire schedule within the time interval (0, lcm] [60] because the task set Ͳ repeats the same execution pattern during each hyperperiod. This study aims to identify the optimum approach for task partitioning and to determine the proper execution speed that ensures each task meets its deadline while minimizing the overall energy dissipation . The optimization of dynamic energy consumption on a DVFS multicore processor is classified as an optimization problem. Specifically, it entails finding a feasible schedule that minimizes energy consumption. Feasible scheduling is achieved when all tasks are completed within their specified timing constraints [46].
Proposed approach
The HTBP algorithm is proposed for partitioning periodic real-time tasks on a non-ideal DVFS multicore processor with shared resources. Dynamic-priority-based scheduling is considered for this research due to its higher utilization limit compared to fixed-priority approaches. So, the partitioned earliest deadline first (P-EDF) [10] is employed as a priority-driven scheduling approach where tasks are dynamically prioritized based on their deadlines, with higher priority given to tasks with earlier deadlines. It initially divides tasks offline among available cores before scheduling them on the respective cores. The EDF algorithm always runs the highest priority task.
Multiple task assignment heuristics have been utilized for this partitioning problem, including worst-fit decreasing, best-fit decreasing, and first-fit decreasing. The Highest Task-Based Partitioning (HTBP) method is a novel allocation approach designed to decrease energy consumption by assigning tasks according to their similarities to the highest resource usage task ().
Task synchronization
This research employs a Multiprocessor Stack Resource Policy (MSRP) as a synchronization protocol to manage and coordinate access to shared resources in a multiprocessor system. MSRP helps ensure that tasks running on different cores do not interfere with each other by enforcing mutual exclusion, thereby maintaining data consistency and system stability. Once tasks have been assigned to specific cores, resources can be categorized into local and global types. Local resources are accessed exclusively by tasks running on the same core, while global resources are available to tasks executing on different cores.
When we use the P-EDF and MSRP to schedule tasks, two types of blocking may occur remote and local blocking. Remote (global) blocking occurs when a task attempts to hold a global resource currently held by a lower-priority task on a different core. In such cases, must enter a state of spin lock (busy waiting) until permission is granted to hold that global resource. It continues execution once the global resource becomes available, released by the previously locked task. Local blocking takes place when one task is blocked by another task, and both tasks are running on the same core.
To predict the possible blocking under EDF and MSRP, every task is given a fixed value, called the preemption level [11]. Task preemption levels are allocated in inverse proportion to their respective relative deadlines . The main feature of preemption levels is that a task can only interrupt another task if . When dealing with local resources, MSRP behaves identically to SRP. Every local resource has a ceiling representing the highest preemption level among all tasks that may access that resource. For global resources, we defined a global resource ceiling for each core which is the highest preemption level of all tasks that belong to that core.
Schedulability analysis of the MSRP
After tasks are partitioned, they are scheduled to be executed on a uniprocessor. the researchers in [14] provided that, A collection of real-time tasks can be scheduled using EDF and SRP if
4
where is the worst-case blocking time of .
For multicore processors, it’s necessary to discuss both local and global blocking. Under the MSRP, denotes the ’s worst-case blocking time when it accesses a local resource and can be computed using the following equation:
5
Where is a ceiling of local resource in core and can be determined using the following equation:
6
let be the ’s worst-case blocking time when it accesses a global resource and can be computed using the following equation:
7
Where is the busy waiting time (the spin lock time) that each task assigned to the core can wait to access a global resource , can be computed using the following equation:
8
Hence, the maximum of the and yields the worst-case blocking time for task :
9
The worst-case spin lock time of is the maximum time that spends waiting for a global resource in the core , which can be computed using the following equation:
10
Assume that a set of tasks is sorted by non-increasing preemption level () on core . According to the schedulability analysis of multicore processor environments [11], a set is schedulable under P-EDF and MSRP if:
11
Highest Task-Based partitioning (HTBP) algorithm
This section presents the proposed HTBP algorithm to partition dependent tasks among cores in a real-time system to minimize the remote blocking time of tasks, reduce inter-communication costs between cores, and achieve load balancing across the system. Fig. 1 introduces the flowchart of the proposed HTBP algorithm while Algorithm 1 presents its detailed steps. We calculate task utilization and resource usage , for each task. can be computed using the following equation:
Where is the number of resources that will request and is the summation of critical sections for divided by its period. represents the highest resource usage task. It is the task having the maximum value of . In lines 11–16, we use for loop to identify . The similarity between and () represents the shared resources that both and will request. It can be calculated by , as shown in line 20. Suppose there is a similarity between and ). In that case, we calculate which is the critical section sum of for each resource divided by the period of as shown in lines 23–25. Then we compute by the following equation:
Let be a list of tasks for which . Initially, is empty as shown in line 10 and is populated in line 22. We then sort tasks in the list in a decreasing order according to as in line 30. Now we have the most similar tasks in ordered by in decreasing order. In line 31, HTBP specifies the minimum utilization core as shown in Algorithm 2, and then assigns tasks in to the minimum utilization core (in lines 32–34) until its schedulability test is satisfied, as shown in Algorithm 3.
[See PDF for image]
Fig. 1
Highest Task-Based Partitioning (HTBP) algorithm flowchart
If HTBP doesn’t find this means all the remaining tasks without access to resources. It picks a task from and then specifies the minimum utilization core to assign this task. This is repeated until becomes empty, as shown in lines 36–38. Tasks without access to resources are assigned to different cores to ensure load balancing.
In algorithm 3, after is assigned to core as shown in line 3, HTBP updates core utilization to be and deleted from as shown in lines 4,5. Finally, after the HTBP algorithm finishes, we allocate to and every core utilizes EDF for scheduling the allocated tasks.
Dual speed algorithm
Once tasks have been permanently allocated to the cores, an execution speed Strategy is determined to minimize energy consumption while maintaining feasibility. The Dual-Speed (DS) algorithm [55], employing a Two-Speed Strategy (TSS) approach, starts by performing tasks at a high-level speed (the initial base speed), and then transitions to a low-level speed once the actual blocking time is determined. We use the per-core DVFS technique to calculate an initial base speed for every .
Specifically, the actual spin lock time and blocking time can be obtained when a task begins execution and when a global resource becomes available for access. During these times, it is feasible to compute a reduced execution speed using Eq. (13), allowing tasks to be executed with decreased energy consumption. Hence, tasks will operate at a lower speed until their deadline without missing it.
Illustrative example
Consider a system comprising a dual-core processor and seven tasks. The task parameters are reported in Table 2. Tasks are partitioned using both the SBP algorithm [55] and our proposed HTBP algorithm.
By using the SBP algorithm for task partitioning, Tasks (, , and ) are allocated to core . Tasks (, , and ) are allocated to core , as shown in Fig. 2. We have two local resources for core , two local resources for core , and two global shared resources . When task tries to lock the global resource which is currently locked by on core , employs spin lock (also called busy waiting) until it gains permission to access . Once completes using the global resource , will continue to access . task becomes non-preemptable when it enters a critical section for a global resource. This means that, must wait (holding core ) until is released (i.e., unlocked) by . It is important to reduce the busy waiting (spin lock) time as it is wasted time.
Table 2. Tasks example parameter set
3.5 | 23 | 23 | 0.152 | , | |
6.5 | 22 | 22 | 0.295 | , , | |
2.5 | 19.5 | 19.5 | 0.128 | ||
5 | 16 | 16 | 0.312 | , | |
3 | 16 | 16 | 0.188 | ||
4 | 20 | 20 | 0.2 | ||
3 | 16 | 16 | 0.1875 | , |
By using our proposed HTBP algorithm, as shown in Fig. 3, is assigned to instead of . Hence becomes a local resource to . Therefore, the time at which spins in waiting until releases does not exist. The spin lock keeps a task actively spinning in a loop, consuming CPU cycles even when it’s not making progress. By minimizing the time spent in the spin-lock loop, the overall CPU activity associated with busy-waiting can be reduced. Lower CPU activity generally leads to lower power consumption. As we can observe, our HTBP algorithm reduced the number of global resources. Therefore, the spin lock time is minimized and hence the energy consumption is reduced.
The SBP algorithm calculated a core utilization , where is the number of cores. It then allocates highly similar tasks to the same core, ensuring that the total task utilization does not exceed . Despite the SBP making a balance between cores by partitioning according to , it forbids similar tasks from assigning to the same core because the core utilization exceeded . While the HTBP algorithm assigns a large number of similar tasks to the same core according to the schedulability test condition.
[See PDF for image]
Fig. 2
Task partitioning by SBP algorithm
[See PDF for image]
Fig. 3
Task Partitioning by HTBP algorithm
The tasks reported in Table 2 are partitioned and simulated by MCRTsim scheduling simulator for 200 time units. We can observe that the energy consumption is 437.177 W when we use the SBP algorithm for partitioning. When we used the HTBP algorithm, the energy consumption was 353.041 W. As observed, HTBP outperforms SBP because HTBP can minimize the number of global resources and therefore reduce remote blocking time.
Time complexity analysis
By analyzing the time required to execute the HTBP algorithm. In lines [5-7] the loop executes times and in line [7] it is required to add all critical sections for that task. In the worst case, a task may require access to all resources. This loop executes times. In line [9], the while loop iterates times, as each iteration a single task is selected and removed from the task list. This outer loop contains some inner loops. The first inner loop in line [11] finds the highest task and iterates over all tasks, so it executes times. Another inner loop is in line [18], also executes times and contains an inner loop in line [23] to find the sum of all critical sections, which are in the worst case . The getMinUtilizationCore() iterates over all cores, . This time can be neglected as . The time required to execute the algorithm is .
Experimental evaluation and analysis
We use a multi-core real-time scheduling simulator (MCRTsim) [12] to execute a set of experiments to evaluate our proposed algorithm. In the MCRTsim simulator, the system environment and the simulated task set are formatted in XML. We performed two groups of experiments to assess and compare the performance of HTBP. The first group involved detailed simulations with synthetic task sets to evaluate energy consumption under diverse and varying conditions. In contrast, the second group utilized a real-world application to examine how the algorithms perform in real-world scenarios.
Simulation settings
Processor settings
We configured a multi-core processor (2, 4, 8, and 16 cores). The processor type is a non-ideal DVFS processor with a per-core DVFS technique. The power dissipation for each speed level is specified according to the processor speeds [55], where the power consumption of the idle state is set to 0, as shown in Table 3. The main focus of the experiments centered around assessing the total energy consumption and the percentage of missed deadlines for task sets, as two performance metrics. Equation (14) is used to compute the energy consumption, :
Where is the core speed at time and is the simulation time.
Table 3. Multi-core processor speeds
Speed (MHZ) | idle | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 | 900 | 1000 |
|---|---|---|---|---|---|---|---|---|---|---|---|
Power consumption(W) | 0 | 1.6 | 12.24 | 41.12 | 97.36 | 190.08 | 328.4 | 521.44 | 778.32 | 1108.16 | 1520.08 |
Synthetic task set settings
Multiple workloads were randomly generated using the task set generator included in the MCRTsim simulator. The specific parameters used for each configuration are shown in Table 4. The worst-case computation amounts and periods of tasks were randomly selected from the intervals [8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39–40] and [100–250] ms, respectively. The length and the position of the critical sections during the execution of each task were randomly chosen. The blocking factor was set from 0.1 to 0.4, indicating the proportion of the blocking interval within a task’s execution duration. The critical section length should be no larger than . The number of accessed resources by a task is randomly selected from 1 to the total number of shared resources. The experiments involved assessing over 5 feasible task sets for each configuration and then averaging the outcomes.
Real-world application task sets
The computerized Numerical Control (CNC) system [57, 61] is considered a real-world application to assess the performance of the HTBP algorithm. The CNC task set contains 8 periodic tasks (sensor information acquisition, motion control, error compensation, PLC, etc.). Their WCETs and periods are shown in Table 5. The WCET of each task in CNC is randomly assigned a value between 0.035 and its corresponding period. The periods are selected within the range of [2.4, 9.6]. We assume a set of shared resources defined as , and each task randomly selects which resources it requires from this set. For each experiment, we generate 10 task sets and present results as the average across these sets. Each task set is simulated for 200 time units. In addition, we analyze how total utilization affects energy consumption per job.
Table 4. Experiment parameters
2 cores | 4 cores | 8 cores | 16 cores | |
|---|---|---|---|---|
Total utilization | [1,1.8] | [2,3.5] | [5,6] | [6,8] |
Number of tasks | [5,10] | [10,20] | [20,30] | [30,50] |
Number of shared resources | [2,3] | [3,5] | [5,8] | [8,10] |
Blocking factor | [0.1,0.4] | [0.1,0.4] | [0.1,0.4] | [0.1,0.4] |
Table 5. The 8 periodic tasks of CNC
WCETs | 0.035 | 0.04 | 0.18 | 0.72 | 0.165 | 0.165 | 0.57 | 0.57 |
|---|---|---|---|---|---|---|---|---|
Periods | 2.4 | 2.4 | 4.8 | 4.8 | 2.4 | 2.4 | 9.6 | 7.8 |
Experimental results and discussion
During our experiments, the proposed HTBP, SBP [55], best-fit decreasing (BFD), and worst-fit decreasing (WFD) algorithms [62] were used to partition the generated feasible workloads. P-EDF and MSRP were employed to schedule and synchronize task sets in the MCRTsim simulator. We use a Dual-Speed (DS) algorithm to assign a base speed to each core with the dynamic speed adjustment method. The performance of HTBP, SBP, WFD, and BFD algorithms was evaluated under P-EDF, MSRP, and DS algorithms.
The task set reported in Table 2 is partitioned by four partitioning algorithms (HTBP, SBP, WFD, and BFD) and is simulated by an MCRTsim scheduling simulator for 200 time units. The simulation result and energy consumption for scheduling the task set are reported in Table 6. Since the HTBP algorithm allocates the most similar tasks to the same core based on the length of their critical sections, it can minimize the number of global shared resources. It therefore reduces the remote blocking time, hence energy consumption is reduced compared to the other mentioned partitioning algorithms. Table 6 shows that the energy consumed by the HTBP algorithm is lower than other mentioned partitioning algorithms.
Table 6. The total energy consumption simulated using the task set in Table 2
Partitioning algorithm | HTBP | SBP | WFD | BFD |
|---|---|---|---|---|
Total Energy consumption(w) | 353.041 | 437.177 | 447.649 | 449.174 |
Synthetic task set results
In this work, several task sets are generated for each multicore processor according to the parameters mentioned in Table 4, partitioned using HTBP, SBP, WFD, and BFD algorithms, and simulated by the MCRTsim simulator. The comparison will be made in terms of total energy consumption and the percentage of job-missed deadlines (% job missed deadlines).
The results are depicted in Fig. 4 for total energy consumption. This figure indicates that total energy consumption increases with the number of cores, as expected. However, the HTBP algorithm still has a lower energy consumption compared to other partitioning algorithms on 2, 4, 8, and 16-core processors. This is because HTBP allocates the most similar tasks onto the same core, considering the critical section length. It continues to group similar tasks on the same core until the core passes the schedulability test. Hence, HTBP succeeds in reducing the number of global resources and the length of remote blocking. This suggests that it might be more efficient in task partitioning, leading to better energy savings.
Fig. 5 depicts the percentage of missed deadlines as the number of cores increases. As seen in this figure, the percentage of missed deadlines generally increases with the number of cores. HTBP consistently has the lowest missed deadline percentage across all core configurations. It is the most effective algorithm as it partitions tasks on each core until the core passes the schedulability test according to Eq. (4). Considering blocking time, HTBP can partition tasks with a lower percentage of job-missed deadlines.
[See PDF for image]
Fig. 4
Total energy consumption depending on the number of cores
[See PDF for image]
Fig. 5
The percentage of missed deadlines depending on the number of cores
We set total utilization and the number of shared resources to observe the impact of changing the number of tasks (ranging from 10 to 50) on the average energy consumption per job using the dual-core processor. In Fig. 6, we can see that increasing the number of tasks slightly increases energy consumption, especially after 30 tasks. This energy increase is due to increased competitive tasks that try to access shared resources, and therefore increase total execution time.
We observe the impact of changing the number of shared resources on energy consumption, which varies from 3 to 10 resources. We set the total utilization and the number of tasks . In Fig. 7, we can see that increasing the number of shared resources slightly increases energy consumption (especially from 3 to 8 resources) due to synchronization overhead and blocking delays. In Figs. 6 and 7, HTBP is less affected by the number of tasks and shared resources and is more energy-saving with some stability.
[See PDF for image]
Fig. 6
Energy consumption per job results vs. the number of tasks on a dual core
[See PDF for image]
Fig. 7
Energy consumption per job results vs. the number of shared resources
To evaluate the impact of increasing total utilization on energy consumption, task sets were generated and then partitioned on a system consisting of a dual-core processor using HTBP, SBP, WFD, and BFD algorithms. Fig. 8 depicts the total energy consumption based on task set utilization (U) where the algorithms could schedule successfully. In this figure, we can see that energy consumption increases as total utilization increases for all algorithms and the gap between algorithms widens as utilization increases. The figure indicates that the total energy consumption of HTBP is always lower than SBP, WFD, and BFD algorithms.
[See PDF for image]
Fig. 8
Total energy consumption results vs. total utilization
Real-world task set results
To evaluate the impact of increasing total utilization on energy consumption, the CNC task sets were generated and then partitioned on a system consisting of a dual-core processor using HTBP, SBP, WFD, and BFD algorithms.
As shown in Fig. 9, the energy consumption is strongly dependent on system utilization. As the system utilization increases, the energy consumption also increases for all algorithms. The increase in system utilization leads to longer execution times of tasks that require a higher operating speed. According to Eq. (14), the increase in operating speed leads to an increase in energy consumption. Therefore, all algorithms experience higher energy consumption under increased system utilization, but the energy consumption of HTBP is lower than that of SBP, WFD, and BFD. This is because HTBP allocates the most similar tasks onto the same core, considering the critical section length. Consequently, the global resources numbers and blocking time decrease. The task execution time decreases when the blocking time decreases. Hence, HTBP succeeds in reducing the energy consumption. In summary, HTBP achieves an average energy reduction of 18%, 31%, and 62% compared to SBP, WFD, and BFD, respectively.
[See PDF for image]
Fig. 9
Energy consumption per job results vs. total utilization for the CNC task sets
Conclusions and future work
This paper focuses on the energy-aware partitioning of dependent real-time tasks on a multicore platform. We introduce the HTBP algorithm as a task-to-core assignment algorithm based on similarity. Firstly, it allocates dependent tasks that request access to similar resources to the same core. Secondly, we employ the DS algorithm to assign appropriate execution speeds for each core. Finally, simulation experiments are conducted on both the synthetic task sets and a real-time application such as Computerized Numerical Control (CNC), where predictable timing, safe resource sharing, and energy constraints must be addressed simultaneously using the MCRTsim simulator to evaluate the performance of HTBP.
The proposed partitioning approach outperforms other common task partitioning approaches, SBP, WFD, and BFD. As a result, our proposed technique minimizes the number of global shared resources, thereby reducing inter-core communication. So, it can mitigate blocking times, thereby minimizing energy consumption and the number of missed deadlines. It can minimize energy consumption on different multicore processors with several configurations (2, 4, 8, and 16 cores) compared to the algorithms mentioned above. Table 7 summarizes the percentage reduction in energy consumption of the HTBP algorithm.
Table 7. HTBP energy reduction percentage
Number of Cores | SBP | WFD | BFD |
|---|---|---|---|
2 cores | 11.8% | 12% | 35.8% |
4 cores | 8.15% | 10.58% | 39.1% |
8 cores | 8.3% | 17.1% | 19.7% |
16 cores | 2.5% | 3.9% | 15% |
Beyond simulation, HTBP has a practical significance. It is particularly applicable to real-time industrial environments such as automotive electronic control units (ECUs) and CNC machine control systems, where synchronized shared resources, energy efficiency, and hard timing guarantees are all major. Moreover, HTBP covers various core configurations (2, 4, 8, and 16), making it feasible for integration into embedded systems, such as ARM-based SoCs and multicore DSPs.
Our proposed HTBP algorithm statically partitions the task set using the P-EDF algorithm, a partitioning scheduling technique that prevents task migration. Extending the proposed approach to be compatible with semi-partitioned and global scheduling schemes (where tasks may migrate) requires adapting the migration and blocking mechanisms to guarantee mutual exclusion and limit blocking times. This represents a meaningful direction for future research. HTBP assumes a static and known resource set. It would need to be extended to support adaptive resource tracking and contention-aware allocation to be utilized on practical systems that dynamically change.
Author contributions
Amina Konswa conceived the primary ideas, proposed the algorithms, and wrote the initial paper. Amina Konswa and Amr Mohamed initiated the evaluation process and contributed ideas for the simulation experiments. All authors shaped research ideas and enriched English writing. All authors participated in reviewing the manuscript.
Funding
Open access funding provided by The Science, Technology & Innovation Funding Authority (STDF) in cooperation with The Egyptian Knowledge Bank (EKB). This research was funded by the Science and Technology Development Fund (STDF), Egypt.
Data availability
No datasets were generated or analysed during the current study.
Declarations
Ethics approval and consent to participate
Not required.
Consent for publication
The coauthors all agree on the paper’s publication.
Competing interests
The authors declare no competing interests.
Abbreviations
Highest Task-Based Partitioning
Similarity-Based Partitioning
Best-Fit Decreasing
Worst-Fit Decreasing
Multiprocessor Stack Resource Policy
Multiprocessor Priority Ceiling Protocol
Multi-Core Real-Time Simulator
Partitioned Earliest Deadline First
Dynamic Voltage and Frequency Scaling
Worst-Case Execution Time
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
1. Flautner K, Flynn D, Roberts D, Patel DI (2004) IEM926: an energy efficient SoC with dynamic voltage scaling. In: Proceedings Design, Automation and Test in Europe Conference and Exhibition. IEEE Comput. Soc, pp 324–327
2. Chen JJ, Kuo CF (2007) Energy-efficient scheduling for real-time systems on dynamic voltage scaling (DVS) platforms. In: Proceedings – 13th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, RTCSA 2007. IEEE, pp 28–38
3. Zhuravlev, S; Saez, JC; Blagodurov, S et al. Survey of energy-cognizant scheduling techniques. IEEE Trans Parallel Distrib Syst; 2013; 24, pp. 1447-1464. [DOI: https://dx.doi.org/10.1109/TPDS.2012.20]
4. Burmyakov, A; Nikolić, B. An exact comparison of global, partitioned, and semi-partitioned fixed-priority real-time multiprocessor schedulers. J Syst Archit; 2021; [DOI: https://dx.doi.org/10.1016/j.sysarc.2021.102313]
5. Gracioli, G; Fröhlich, AA; Pellizzoni, R; Fischmeister, S. Implementation and evaluation of global and partitioned scheduling in a real-time OS. Real-Time Syst; 2013; 49, pp. 669-714. [DOI: https://dx.doi.org/10.1007/s11241-013-9183-3] 1291.68082
6. Coffman EG, Garey MR, Johnson DS (1984) Approximation algorithms for Bin-Packing — An updated survey. In: Ausiello G, Lucertini M, Serafini P (eds) Algorithm design for computer system design. Springer Vienna, pp 49–106
7. Ullman, JD. NP-complete scheduling problems. J Comput Syst Sci; 1975; 10, pp. 384-393.391585 [DOI: https://dx.doi.org/10.1016/S0022-0000(75)80008-0] 0313.68054
8. Brandenburg BB, Anderson JH (2010) Optimality Results for Multiprocessor Real-Time Locking. In: 2010 31st IEEE Real-Time Systems Symposium. IEEE, pp 49–60
9. Yang M, Wieder A, Brandenburg BB (2015) Global Real-Time Semaphore Protocols: A Survey, Unified Analysis, and Comparison. In: 2015 IEEE Real-Time Systems Symposium. IEEE, pp 1–12
10. Baker, Theodore P (2005) A comparison of global and partitioned EDF schedulability tests for multiprocessors. Technical Report TR-051101, Florida State University
11. Gai P, Lipari G, Di Natale M (2001) Minimizing memory utilization of real-time task sets in single and multi-processor systems-on-a-chip. In: Proceedings 22nd IEEE Real-Time Systems Symposium (RTSS 2001) (Cat. No.01PR1420). IEEE, pp 73–83
12. Wu J, Huang YC (2017) MCRTsim: A simulation tool for multi-core real-Time systems. In: Proceedings of the 2017 IEEE International Conference on Applied System Innovation: Applied System Innovation for Modern Technology, ICASI 2017. IEEE, pp 461–464
13. Sha, L; Rajkumar, R; Lehoczky, JP. Priority inheritance protocols: an approach to real-time synchronization. IEEE Trans Comput; 1990; 39, pp. 1175-1185.1068998 [DOI: https://dx.doi.org/10.1109/12.57058] 1395.90151
14. Baker TP (1990) A stack-based resource allocation policy for realtime processes. In: [1990] Proceedings 11th Real-Time Systems Symposium. IEEE, pp 191–200
15. Baker, TP. Stack-based scheduling of realtime processes. Real-Time Syst; 1991; 3, pp. 67-99.1144804 [DOI: https://dx.doi.org/10.1007/BF00365393]
16. Rajkumar R (1990) Real-time synchronization protocols for shared memory multiprocessors. In: Proceedings., 10th International Conference on Distributed Computing Systems. IEEE Comput. Soc. Press, pp 116–123
17. Rajkumar, R. Synchronization in multiple processor systems. Synchronization in Real-Time systems; 1991; Boston, MA, Springer: pp. 61-118. [DOI: https://dx.doi.org/10.1007/978-1-4615-4000-7_3] 0753.68004
18. Chen, MI; Lin, KJ. Dynamic priority ceilings: a concurrency control protocol for real-time systems. Real-Time Syst; 1990; 2, pp. 325-346. [DOI: https://dx.doi.org/10.1007/BF01995676]
19. Chen C-M, Tripathi SK, Blackmore A (1994) A Resource Synchronization Protocol for Multiprocessor Real-Time Systems. In: 1994 International Conference on Parallel Processing Vol. 3. IEEE, pp 159–162
20. Gai P, Di Natale M, Lipari G et al (2003) A comparison of MPCP and MSRP when sharing resources in the Janus multiple-processor on a chip platform. In: The 9th IEEE Real-Time and Embedded Technology and Applications Symposium, 2003. Proceedings. IEEE, pp 189–198
21. Block A, Leontyev H, Brandenburg BB, Anderson JH (2007) A flexible real-time locking protocol for multiprocessors. In: Proceedings – 13th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, RTCSA 2007
22. Chen N, Zhao S, Gray I et al (2022) MSRP-FT: Reliable Resource Sharing on Multiprocessor Mixed-Criticality Systems. In: 2022 IEEE 28th Real-Time and Embedded Technology and Applications Symposium (RTAS). IEEE, pp 201–213
23. Johnson DS (1973) Near-Optimal Bin Packing Algorithms. Dissertation, Massachusetts Institute of Technology
24. Baruah S (2011) The Partitioned EDF Scheduling of Sporadic Task Systems. In: 2011 IEEE 32nd Real-Time Systems Symposium. IEEE, pp 116–125
25. Lakshmanan K, De Niz D, Rajkumar R (2009) Coordinated task scheduling, allocation and synchronization on multiprocessors. In: Proceedings - Real-Time Systems Symposium. IEEE, pp 469–478
26. Nemati, F; Nolte, T; Behnam, M. Partitioning real-time systems on multiprocessors with shared resources. International Conference On Principles Of Distributed Systems; 2010; Berlin, Heidelberg, Springer: pp. 253-269. [DOI: https://dx.doi.org/10.1007/978-3-642-17653-1_20]
27. Akram N, Zhang Y, Ali S, Amjad HM (2019) Efficient Task Allocation for Real-Time Partitioned Scheduling on Multi-Core Systems. In: Proceedings of 2019 16th International Bhurban Conference on Applied Sciences and Technology, IBCAST 2019. IEEE, pp 492–499
28. Yang, M; Huang, WH; Chen, JJ. Resource-oriented partitioning for multiprocessor systems with shared resources. IEEE Trans Comput; 2019; 68, pp. 882-898.3956670 [DOI: https://dx.doi.org/10.1109/TC.2018.2889985] 1544.68051
29. Wieder A, Brandenburg BB (2013) Efficient partitioning of sporadic real-time tasks with shared resources and spin locks. In: Proceedings of the 8th IEEE International Symposium on Industrial Embedded Systems, SIES 2013. IEEE, pp 49–58
30. Zhang, Y-W; Ma, J-P; Gu, Z. Partitioned scheduling with shared resources on imprecise mixed-criticality multiprocessor systems. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems; 2025; 44, pp. 65-76. [DOI: https://dx.doi.org/10.1109/TCAD.2024.3433411]
31. Zhang F, Chanson ST (2002) Processor voltage scheduling for real-time tasks with non-preemptible sections. In: 23rd IEEE Real-Time Systems Symposium, 2002. RTSS 2002. IEEE Comput. Soc, pp 235–245
32. Elewi AM, Awadalla MHA, Eladawy MI (2008) Energy-efficient multi-speed algorithm for scheduling dependent real-time tasks. In: 2008 International Conference on Computer Engineering and Systems, ICCES 2008. IEEE, pp 237–242
33. Zhang, F; Chanson, ST. Blocking-aware processor voltage scheduling for real-time tasks. ACM Trans Embed Comput Syst; 2004; 3, pp. 307-335. [DOI: https://dx.doi.org/10.1145/993396.993401]
34. Lee J, Koh K, Lee CG (2007) Multi-speed DVS algorithms for periodic tasks with non-preemptible sections. In: Proceedings – 13th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, RTCSA 2007. IEEE, pp 459–468
35. Jejurikar R, Gupta R (2002) Energy aware task scheduling with task synchronization for embedded real time systems. In: Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems. pp 164–169
36. Wu, J. Energy-efficient scheduling of real-time tasks with shared resources. Future Gener Comput Syst; 2016; 56, pp. 179-191. [DOI: https://dx.doi.org/10.1016/j.future.2015.05.012]
37. Zhang, Y; Li, H. Energy aware mixed tasks scheduling in real-time systems. Sustainable Computing: Inf Syst; 2019; 23, pp. 38-48. [DOI: https://dx.doi.org/10.1016/j.suscom.2019.06.004]
38. Liu J, Guo J (2014) Voltage Island Aware Energy Efficient Scheduling of Real-Time Tasks on Multi-core Processors. In: 2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS). IEEE, pp 645–652
39. Anderson JH, Baruah SK (2004) Energy-efficient synthesis of periodic task systems upon identical multiprocessor platforms. In: 24th International Conference on Distributed Computing Systems, 2004. Proceedings. IEEE, pp 428–435
40. Aydin H, Yang Q (2003) Energy-aware partitioning for multiprocessor real-time systems. In: Proceedings - International Parallel and Distributed Processing Symposium, IPDPS 2003. IEEE
41. Zhu, D; Mossé, D; Melhem, R. Power-aware scheduling for AND/OR graphs in real-time systems. IEEE Trans Parallel Distrib Syst; 2004; 15, pp. 849-864. [DOI: https://dx.doi.org/10.1109/TPDS.2004.45]
42. Gruian, F. System-level design methods for low-energy architectures containing variable voltage processors. Power-Aware Computer Systems: First International Workshop, PACS 2000 Cambridge, MA, USA, November 12, 2000 Revised Papers 1; 2001; Berlin, Heidelberg, Springer: pp. 1-12.0976.68585
43. Chen JJ, Thiele L (2010) Energy-efficient scheduling on homogeneous multiprocessor platforms. In: Proceedings of the ACM Symposium on Applied Computing. Association for Computing Machinery, pp 542–49
44. Baruah, SK; Cohen, NK; Plaxton, CG; Varvel, DA. Proportionate progress: a notion of fairness in resource allocation. Algorithmica; 1996; 15, pp. 600-625.1389349 [DOI: https://dx.doi.org/10.1007/BF01940883] 0848.68020
45. Lee, WY. Energy-efficient scheduling of periodic real-time tasks on lightly loaded multicore processors. IEEE Trans Parallel Distrib Syst; 2012; 23, pp. 530-537. [DOI: https://dx.doi.org/10.1109/TPDS.2011.87]
46. Kandhalu A, Junsung Kim, Lakshmanan K, Rajkumar R (2011) Energy-Aware Partitioned Fixed-Priority Scheduling for Chip Multi-processors. In: 2011 IEEE 17th International Conference on Embedded and Real-Time Computing Systems and Applications. IEEE, pp 93–102
47. Guasque, A; Balbastre, P; Crespo, A; Coronel, J. Energy efficient partition allocation in partitioned systems. IFAC-PapersOnLine; 2018; 51, pp. 82-87. [DOI: https://dx.doi.org/10.1016/j.ifacol.2018.06.241]
48. Moulik, S; Sarkar, A; Kapoor, HK. Tarts: a temperature-aware real-time deadline-partitioned fair scheduler. J Syst Architect; 2021; 112, [DOI: https://dx.doi.org/10.1016/j.sysarc.2020.101847] 101847.
49. Sharma, Y; Moulik, S. RT-SEAT: a hybrid approach based real-time scheduler for energy and temperature efficient heterogeneous multicore platforms. Results Eng; 2022; 16, [DOI: https://dx.doi.org/10.1016/j.rineng.2022.100708] 100708.
50. Sharma, Y; Chakraborty, S; Moulik, S. ETA-HP: an energy and temperature-aware real-time scheduler for heterogeneous platforms. J Supercomput; 2022; 78, pp. 1-25. [DOI: https://dx.doi.org/10.1007/s11227-021-04257-7]
51. Sharma, Y; Moulik, S. FATS-2TC: A fault tolerant real-time scheduler for energy and temperature aware heterogeneous platforms with two types of cores. Microprocess Microsyst; 2023; 96, [DOI: https://dx.doi.org/10.1016/j.micpro.2022.104744] 104744.
52. Sharma Y, Moulik S, Chakraborty S (2022) RESTORE: Real-Time Task Scheduling on a Temperature Aware FinFET based Multicore. In: 2022 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, pp 608–611
53. Chakraborty, S; Sharma, Y; Moulik, S. Treafet: temperature-aware real-time task scheduling for FinFET based multicores. ACM Trans Embed Comput Syst; 2024; 23, [DOI: https://dx.doi.org/10.1145/3665276] 1–31.
54. Tsai, TH; Fan, LF; Chen, YS; Yao, TS. Triple speed: energy-aware real-time task synchronization in homogeneous multi-core systems. IEEE Trans Comput; 2016; 65, pp. 1297-1309.3475178 [DOI: https://dx.doi.org/10.1109/TC.2015.2441704] 1360.68110
55. Wu J, Hong X-J (2017) Energy-Efficient Task Scheduling and Synchronization for Multicore Real-Time Systems. In: 2017 IEEE 3rd International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing, (HPSC) and IEEE International Conference on Intelligent Data and Security (IDS). IEEE, pp 179–184
56. El Sayed, MA; Saad, ESM; Aly, RF; Habashy, SM. Energy-efficient task partitioning for real-time scheduling on multi-core platforms. Computers; 2021; 10, [DOI: https://dx.doi.org/10.3390/computers10010010] 10.
57. Zhang, YW. Energy-aware fixed priority scheduling with shared resources in standby-sparing systems. Microprocess Microsyst; 2021; 87, [DOI: https://dx.doi.org/10.1016/j.micpro.2021.104362] 104362.
58. Zhang, Y-W; Chen, R-K. Energy-efficient partitioned-RM scheduling for shared resources imprecise mixed-criticality tasks. ACM Trans Embed Comput Syst; 2025; 24, pp. 1-30. [DOI: https://dx.doi.org/10.1145/3728641]
59. Burd TD, Brodersen RW (1995) Energy efficient CMOS microprocessor design. In: Proceedings of the Annual Hawaii International Conference on System Sciences. IEEE, pp 288–297
60. Liu, CL; Layland, JW. Scheduling algorithms for multiprogramming in a hard-real-time environment. J ACM; 1973; 20, pp. 46-61.343898 [DOI: https://dx.doi.org/10.1145/321738.321743] 0265.68013
61. Zhang, Y; Zhang, H; Wang, C. Reliability-aware low energy scheduling in real time systems with shared resources. Microprocess Microsyst; 2017; 52, pp. 312-324. [DOI: https://dx.doi.org/10.1016/j.micpro.2017.06.020]
62. Coffman EG, Csirik J, Galambos G et al (2013) Bin packing approximation algorithms: survey and classification. Handbook of combinatorial optimization. Springer, pp 455–531
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.