1. Introduction
Computer systems nowadays must perform much better than ever before, ensuring the simultaneous running of many applications. By using heterogeneous multi-core processors and increasing the number of processor cores, it is possible to improve the performance while keeping energy consumption at the bay [1,2,3,4,5]. From small, embedded devices to large data centers, heterogeneous multi-core systems have been widely used. It is expected that in the near future, the number of heterogeneous processors and cores in these systems will increase dramatically [6,7,8]. On the other hand, although the performance of such systems has been greatly improved, the power consumption is also increasing. Huge energy consumption has caused various problems, such as economy, environment, technology and so on [9,10,11,12,13]. Therefore, energy consumption is one of the main design constraints for such heterogeneous multi-core systems. A well-known typical mechanism for reducing power consumption of computing systems is dynamic voltage and frequency scaling (DVFS), which is realized to achieve the balance between energy consumption and performance by reducing the power supply voltage and frequency of the processor at the same time, when the task is running [1,14,15,16,17,18,19]. Therefore, energy consumption constrained task scheduling on processors with adjustable voltage and frequency has attracted extensive research [20,21,22,23].
In the existing research, works related to the energy consumption of heterogeneous multi-core processors, some of the objectives are to minimize the energy consumption under certain performance requirements. Others are to minimize or maximize other indicators, such as performance and reliability, under certain energy consumption constraints [24,25,26,27,28,29,30,31]. Our studies fall into the second category. To be more precise, we study the following problem: minimizing the schedule length of energy consumption constrained parallel applications on heterogeneous multi-core systems. There have been some studies in recent years that are consistent with our goals. In order to determine the energy consumption constraint of each task, the common method in previous studies is to pre-allocate the energy consumption of the unscheduled tasks [20,21,22,23]. Therefore, these studies focus on how to allocate the overall energy consumption constraints to each task reasonably. Xiao et al. [20] came up with an algorithm called MSLECC (minimizing schedule length of energy consumption constrained). It takes the minimum energy consumption of a task as its pre-allocation energy consumption. The drawback of this method is that it is unfair to low priority tasks. High priority tasks take up too much energy consumption, so that low priority tasks can only be allocated to low-power processor cores and close to the lowest execution frequency due to the energy consumption constraint, which will lead to an increase in the overall schedule length. After that, some studies improved the energy pre-allocation method, so that the energy pre-allocation is fair to each task. One study pre-allocated the energy consumption according to the task execution time proportion [21], another study used the average energy consumption as the pre-allocation energy consumption of each task [22], and another study used a defined task energy consumption weight value as the pre-allocation energy consumption of tasks, whose algorithm is called ISAECC [23]. However, it is not the best practice to treat each task fairly and the task level is an important issue when pre-allocating energy consumption to the tasks. We pre-allocate more energy consumption to the tasks in the task levels with fewer tasks, because when their execution time becomes shorter, the overall scheduling length is more likely to decrease. In addition, previous articles ignored the negative impact of local optimal scheduling algorithm. Therefore, we develop a task execution frequency re-adjustment mechanism that also uses DVFS to further reduce the schedule length. To summarize, the main contributions of this article are as follows.
- We design a novel energy pre-allocation strategy considering task level and prove its feasibility.
- We develop a novel scheduling algorithm to minimize the schedule length under energy consumption constraints based on the new energy pre-allocation strategy.
- We introduce a frequency re-adjustment mechanism after task scheduling to reduce the negative impact of local optimization.
- We evaluate our algorithm based on real parallel applications. The experimental results consistently prove the superiority and competitiveness of our algorithm.
The structure of this article is as follows. Section 2 reviews some existing studies that are relevant to us. Section 3 gives some preliminaries related to the problem of minimizing the schedule length for energy consumption constrained parallel applications. In Section 4, we present our approach for this problem. In Section 5, we discuss and analyze the experimental results. Finally, we conclude the paper in Section 6.
2. Related Work
Energy saving design technology based on DVFS was first proposed by [1]. Nowadays, DVFS has been widely used in multi-core task scheduling problems related to energy consumption. Reference [2] studied the problem of minimizing the schedule length of independent sequential applications with energy consumption constraints. In Reference [29], the task scheduling problem with energy consumption constraints is considered as a combinatorial optimization problem. In Reference [31], the authors consider three constraints (energy consumption, deadline and reward). These studies are mainly focused on homogeneous systems, so they are different from our study.
In addition to the above research, many researchers have studied the task scheduling problem on heterogeneous multi-core systems. For example, reference [32] proposed an energy-saving workflow task scheduling algorithm based on DVFS. Huang et al. [33] proposed an enhanced energy-saving scheduling algorithm to minimize energy consumption under the condition of satisfying a certain performance level. Rusu et al. [31] added constraints and proposed an efficient algorithm for minimizing energy consumption under multiple constraints. The goal of these studies is generally contrary to ours. We study the minimization of the schedule length under energy consumption constraints, while those studies focus on minimizing energy consumption under other constraints.
There are also a lot of excellent studies that are closely related to our study. For example, a representative paper proposed the classical Heterogeneous Earliest Finish Time (HEFT) algorithm, which was developed to minimize the schedule length in heterogeneous multicore systems. The application model and energy consumption model they use are consistent with ours, but they do not consider the constraints of energy consumption. At present, the studies close to us should be the four articles mentioned in Section 1 [20,21,22,23]. They pre-allocate energy consumption to each task according to the minimum energy consumption of the task [20], the execution time ratio of the task [21], the overall average energy consumption [22], or the defined task energy consumption weight [23]. What is not considered in the above studies is that different task hierarchies have different impacts on the overall schedule length. Moreover, they ignored the negative impact of the local optimal characteristics of scheduling algorithm on the schedule length. Based on this, we propose a novel energy pre-allocation strategy considering task level, and, in order to reduce the negative impact of local optimal characteristics, we develop a scheduling task execution frequency re-adjustment mechanism. Finally, our method achieves better performance than previous studies.
3. Models and Preliminaries
In this section, we first introduce the application model (Section 3.1), then the energy consumption model (Section 3.2), and next we describe in detail the issues that need to be addressed (Section 3.3). Finally, we briefly introduce the current situation and reveal its limitations (Section 3.4). Table 1 shows the main notations we use.
3.1. Application Model
As in previous studies [20,21,22,23,24,34], we also use directed acyclic graph (DAG) to represent parallel application models. As for the processor cores, we defineU={u1,u2,…,uk,…,u|U|}to represent a collection of processor cores, where|U|is defined as the number of processor cores. Note that for any setX, we use|X|to denote its size. We define the DAG application model asG={N,W,C}.N={n1,n2,…,ni,…,n|N|}represents the set of nodes in the graph, that is, the set of tasks in the application. Due to the heterogeneous nature of the processor, the execution time ofni∈Non different processor cores is different.Wrefers to a matrix with size|N|×|U|, wherewi,kdenotes the execution time fornito run onukwith the maximum frequency.Cdenotes the weight of edges between connected nodes in DAG, that is, the communication time between tasks.ci,j∈Crepresents the communication time fromnitonj. Ifci,j=0, it means there is no communication fromnitonj. We definepred(ni)andsucc(ni)as the set of direct predecessor tasks and the set of direct successor tasks of taskni. For example,pred(n2)={n1}andsucc(n2)={n8,n9}. We definenentryandnexit as the entry task and the exit task of an application. In Figure 1,nentryandnexitaren1andn10.
Figure 1 shows an example of a parallel application based on DAG with ten tasks. Each node in Figure 1 represents a task, and the values on the edges between connecting nodes represent the communication time between the two nodes if they are not assigned to the same processor core. For example, the value 18 on the edge betweenn1andn2indicates that the communication time betweenn1andn2is 18.
Assuming that there are three heterogeneous processor cores{u1,u2,u3} in the system, Table 2 shows the execution time of the tasks in Figure 1 running on each processor core with the maximum frequency. For example, the first number 14 in Table 2 indicates that the execution time ofn1running onu1with the maximum frequency is 14.
3.2. Energy Model
In DVFS technology, the relationship between supply voltage and operating frequency is almost linear. Therefore, DVFS will also adjust the supply voltage when adjusting the clock frequency. Similar to [20,21,22,23], we use frequency regulation to indicate simultaneous regulation of supply voltage and frequency. In this article, we use the same energy model as the references [20,21,22,23]. Therefore, the calculation formula of system power consumption with respect to frequency is as follows:
P(f)=Ps+h(Pind+Pd)=Ps+h(Pind+Cef fm).
In the above equation,Psdenotes static power and can only be removed when the system is completely powered down.Pindis a constant that represents the frequency-independent dynamic power, that is, it corresponds to power independent of CPU processing speed.Pddenotes frequency-dependent power, including the power primarily consumed by the CPU and any power that depends on the system processing frequencyf.hdenotes the system state, specifically,h=1means the system is active and the application is executing;h=0means the system is in the sleep mode or powered down.Cefdenotes the effective capacitance andmdenotes the dynamic power exponent and is no smaller than 2.Cefandmare constants related to the processor system.
Our study is in the active state of the system (h=1 ), so dynamic power consumption is the main part of the whole energy consumption. Considering the unmanageability of static power consumption, this article, like references [20,21,22,23], does not consider static power consumption. Therefore, the calculation formula of system power consumption in this article becomes the following equation:
P(f)=Pind+Cef fm.
Due to the heterogeneity of processors, each processor should have its own parameters. Assuming that the frequency range of the processorukis from the lowest frequencyfminto maximum frequencyfmaxtemporarily, we define the following sets of parameters:
-
The set ofPind:{P1,ind,P2,ind,…,P|U|,ind};
-
The set ofPd:{P1,d,P2,d,…,P|U|,d};
-
The set ofCef: {C1,ef,C2,ef,…,C|U|,ef};
-
The set ofm:{m1,m2,…,m|U|};
The set of execution frequency:
{{f1,min,f1,a,…,f1,max},{f2,min,f2,a,…,f2,max},…,{f|U|,min,f|U|,a,…,f|U|,max}}.
The execution time of the tasknion the processor coreukwith the frequencyfk,hcan be obtained by the following equation:
wi,k,h=wi,k×fk,maxfk,h.
Then the energy consumptionE(ni, uk, fk,h)of the tasknion the processor coreukwith the frequencyfk,hcan be obtained by the following equation:
E(ni, uk, fk,h)=(Pk,ind+Ck,ef×(fk,h)mk)×wi,k×fk,maxfk,h.
Therefore, the energy consumption of application G will be
E(G)=∑i=1|N|E(ni,upr(i),fpr(i),hz(i)).
As a result of thePind,Eis not monotonic withfand lessf does not always result less energy. Therefore, we can get the minimum value of energy-effective frequency by finding the minimum value of Equation (4). Similar to [20,21,22,23], we define the minimum value of energy-effective frequency asfee. After calculation, we can get that
fee=Pind(m−1)Cefm.
When the execution frequency is less thanfee, it is meaningless to continue to reduce the frequency, because this will increase energy consumption. Therefore, the range of execution frequency variation is[flow,fmax], whereflow=max(fmin,fee). The new set of execution frequency becomes as follows:
{{f1,low,f1,a,…,f1,max},{f2,low,f2,a,…,f2,max},…,{f|U|,mi,f|U|,a,…,f|U|,max}}.
3.3. Preliminaries
3.3.1. Problem Description
The problem to be solved in this study is to assign a suitable frequency and processor core to each task, and minimize the schedule length of the application under the condition that the energy consumption of the application does not exceed the energy consumption constraint [35,36,37,38].
First, we define the earliest start time (EST) and the earliest finish time (EFT) of tasks. Given a taskniexecuted on processoruk, its earliest start time (EST) is denoted asEST(ni,uk), which is computed as
EST(ni,uk)=max(avail[k],maxnj∈pred(ni){AFT(nj)+ci,j′}).
whereavail[k]is the earliest available time of processor coreuk, that is, all tasks executed on processor coreukhave been completed, and processor coreukis ready to execute new tasks.AFT(nj)is the actual finish time of tasknj.ci,j′is the actual communication time between taskniandnj. Ifniandnjare assigned to the same processor core,ci,j′=0; otherwise,ci,j′=ci,j.
The earliest finish time (EFT) of taskniexecuted on processorukwith frequencyfk,his the earliest start time plus the execution time of taskni, which is computed as
EFT(ni,uk,fk,h)=EST(ni,uk,fk,h)+wi,k×fk,max fk,h.
We define SL(G)as the schedule length of application, where
SL(G)=AFT(nexit).
We defineEgiven(G)as the given energy consumption constraint of application G. Therefore, the problem to solve can be expressed as minimizingSL(G)while
E(G)=∑i=1|N|E(ni,upr(i),fpr(i),hz(i))≤Egiven(G).
whereupr(i)denotes the processor core assigned to the taskni, andfpr(i),hz(i)denotes the execution frequency assigned to the taskni.
3.3.2. Effective Range of Energy Consumption Constraint
Since the execution time of each task on each processor core is known, we can obtain the minimum and maximum energy consumption ofnirepresented byEmin(ni)andEmax(ni)respectively by traversing all processors.Emin(ni)andEmax(ni)perform taskniat minimum and maximum frequencies, respectively. The equations are as follows:
Emin(ni)=minuk∈UE(ni,uk,fk,min),
Emax(ni)=maxuk∈UE(ni,uk,fk,max).
Therefore, the minimum and maximum energy consumption of applicationGcan be computed as follows:
Emin(G)=∑i=1|N| Emin(ni),
Emax(G)=∑i=1|N| Emax(ni).
It should be noted that the given energy consumption constraint has a reasonable range. IfEgiven(G)<Emin(G), the energy consumption constraint can never be satisfied; ifEgiven(G)>Emax(G), the energy constraint can always be met. Both of the above situations are unreasonable, so the reasonable range of the given energy consumption constraint isEmin(G)≤Egiven(G)≤Emax(G).
3.3.3. Task Priority Determination
Before scheduling, we need to determine the priority of tasks. Similar to [20,21,22,23], we use the upward rank value (ranku) as the criterion to determine the priority of tasks.rankuis defined as follows:
ranku(ni)=∑k=1|U| wi,k|U|+maxni∈succ(ni){ci,j+ranku(nj)}.
The priority of tasks is sorted in descending order ofranku, that is, the higherranku of a task, the higher the priority of it. Table 3 shows the upward rank values of all the tasks in Figure 1. Therefore, the task priority list of the application in Figure 1 will be{n1,n3,n4,n2,n5,n6,n9,n7,n8,n10}.
3.4. The ISAECC Method In this subsection, we review the existing method that is closest to us and reveal its limitations.
3.4.1. Method Description
The ISAECC method is proposed in [23], and it consists of several major steps:
-
It prioritizes tasks by using the upward rank value which is defined in Section 3.3.3;
- It uses a self-defined energy consumption weight value to give each unscheduled task a pre-allocated energy consumption;
- It calculates the energy consumption constraint of each task according to the given energy consumption constraint of the whole application and the pre-allocated energy consumption value of each task;
- According to the order of the task priority list, it assigns each task the processor core and frequency that can minimize its EST time by traversing each processor core and optional execution frequency.
For the sake of generality, we use{no(1),no(2),…,no(|N|)}to represent the task priority order. Assuming that the currently scheduled task isno(j), then the scheduled task set is{no(1),no(2),…,no(j−1)}and the unscheduled task set is{no(j+1),no(j+2),…,no(|N|)}. Therefore, when scheduling the taskno(j), the overall energy consumption of application G can be expressed as
Eo(j)(G)=∑x=1j−1E(no(x),upr(o(x)),fpr(o(x)),hz(o(x)))+E(no(j))+∑y=j+1|N| Epre(no(y)),
whereEpre(no(y))denotes the pre-allocation energy consumption of taskno(y).
According to the energy consumption constraints shown in Equation (8), we can get
Eo(j)(G)≤Egiven(G).
Therefore, we can get
E(no(j))≤Egiven(G)−∑x=1j−1E(no(x),upr(o(x)),fpr(o(x)),hz(o(x)))−∑y=j+1|N| Epre(no(y)).
Let the energy consumption constraint of taskno(j)be
Egiven(no(j))=Egiven(G) −∑x=1j−1E(no(x),upr(o(x)),fpr(o(x)),hz(o(x)))−∑y=j+1|N| Epre(no(y)).
With Equation (17), we only need to consider the energy consumption constraint of each task which is shown as follows:
E(no(j))≤min{Egiven(no(j)),Emax(no(j))}.
Therefore, the key problem is how to determine the pre-allocation energy consumption (Epre(no(j))) of each task. The central idea of the method ISAECC used is to pre-allocate the energy consumption for unscheduled tasks by a weight mechanism. First, they define the improvable energy of application G calledEie(G), which is computed as
Eie(G)=Egiven(G)−Emin(G).
Then, they defineEave(ni)andEave(G)as the energy consumption level of taskniand the energy consumption level of application G.Eave(ni)andEave(G)are computed as
Eave(ni)=Emin(ni)+Emax(ni)2,
Eave(G)=Emin(G)+Emax(G)2.
Next, they defineel(ni)as the weight of energy consumption level of taskni, which is computed as
el(ni)=Eave(ni)Eave(G).
After that, they calculated the pre-allocated energy consumption for tasknias follows:
Epre(ni)=min{Ewa(ni),Emax(ni)}
whereEwa(ni)is computed as
Ewa(ni)=Eie(G)×el(ni)+Emin(ni).
After determining the pre-allocated energy consumption of each task, the task scheduling can be completed according to steps 3 and 4, described at the beginning of this section (Section 3.4.1).
3.4.2. Limitations of ISAECC
ISAECC solves the problem of increasing schedule length caused by unfair energy constraint allocation of low priority tasks by MSLECC in [20]. However, we find that it is not the best practice to treat each task fairly, because tasks in different levels have different impacts on the whole application. For example, in Figure 1, the taskn1and taskn10should be assigned more energy than other tasks, because their execution time can affect the overall schedule length of the application G directly. It is obvious that if the execution time of taskn1or taskn10is shortened or lengthened under other same conditions, the overall schedule length will shorten or lengthen the same amount accordingly. Other tasks liken2cannot be compared ton1andn10. If the execution time of taskn2is shortened or lengthened under other same conditions, the schedule length of application G may not change obviously, or even not change at all. Therefore, we should consider the levels of the tasks and the number of tasks in each level when pre-allocating the energy consumption of tasks. In addition, previous studies ignored the negative impact of the local optimal characteristics of scheduling algorithm on the schedule length. In our design, these problems have been greatly improved.
4. Our Solution
In this section, we introduce our new strategy in detail. First, we introduce a new task energy pre-allocation method considering task level (Section 4.1). Then, we give a new task scheduling algorithm to minimize the schedule length under the constraint of energy consumption (Section 4.2). Finally, we describe the task execution frequency re-adjustment mechanism we added after getting the preliminary scheduling results (Section 4.3).
4.1. The New Task Energy Pre-Allocation Method
The central idea of our pre-allocation method is to consider the levels of tasks. The impact of tasks in different hierarchies on the overall schedule length of an application is different. For example, in Figure 1, if we shorten the execution time of taskn1by increasing its execution frequency, the schedule length of application G will certainly shorten the corresponding time; but if we shorten the execution time of taskn2, the schedule length of application G is unlikely to change much, because there are still many tasks in a similar position to it. Therefore, a more reasonable approach is to appropriately pre-allocate more energy consumption to taskn1 in the case of Figure 1. Based on this idea, we put forward a new energy pre-allocation method, based on the weight value of energy consumption and the levels of tasks.
4.1.1. Method Description
We define the level of tasks as follows:
{L(nentry)=0L(ni)=maxnx∈pred(ni){L(nx)+1}.
We defineNl={ni,nj,…}as the set of tasks contained in level l, whereL(ni)=L(nj)=L(…)=l. The number of tasks in level l is|Nl|. We can have that the maximum of the level of tasks isL(nexit).
We define the improvable energy of application G (Eie(G)) and the energy consumption level of taskni(Eave(ni)) the same as ISAECC. We define that the energy consumption level of the task level l is the sum of energy consumption level in level l, which is computed as
Eave(Nl)= ∑ni∈Nl Eave(ni).
We defineel(ni,l)as the energy consumption weight of taskniin its level l, which is computed as
el(ni,l)=Eave(ni)Eave(Nl).
We defineEvar(Nl)as the variation energy consumption ofNl, which is computed as
Evar(Nl)=Eave(Nl)K, K={|Nl|, |Nl|<|U||U|, |Nl|≥|U|.
Correspondingly, the variation energy consumption of application G is computed as
Evar(G)=∑lEvar(Nl).
The energy consumption weight ofNlin application G can be defined as
el(Nl)=Evar(Nl)Evar(G).
We defineEie(Nl)as the improvable energy ofNl, which is computed as
Eie(Nl)= Eie(G)×el(Nl).
Therefore, we can get the new energy pre-allocation formula ofnias follows:
Epre(ni)=min{Ewa(ni),Emax(ni)},
where
Ewa(ni)=Eie(Nl)×el(ni,l)+Emin(ni).
4.1.2. Feasibility of the Task Energy Pre-Allocation Mechanism
In order to prove the feasibility of our method, we need to prove the following theorem: Given an application G, if the unscheduled tasks are pre-allocated energy consumption according to our method, then each tasknjcan satisfy Equation (15).
We use mathematical induction to prove the above theorem. First, we need to prove taskno(1)can satisfy Equation (15), and the other tasks are all unscheduled. By Equations (14), (22)–(29), we can have
Eo(1)(G)=E(no(1))+∑y=2|N| Epre(no(y))≤E(no(1))+∑y=2|N| Ewa(no(y))= E(no(1))+∑y=1|N| Ewa(no(y))−Ewa(no(1))= E(no(1))+Egiven(G)−Ewa(no(1)).
We know that
Ewa(no(1))=Eie(Nl)×el(ni,l)+Emin(ni)≥Emin(no(1)).
We can at least find a situation in which
E(no(1))=Emin(no(1)).
In other words, at least whenE(no(1))=Emin(no(1)), we can have
Eo(1)(G)≤E(no(1))+Egiven(G)−Ewa(no(1))≤Egiven(G).
From the above derivation, we prove that taskno(1)can satisfy Equation (15).
Then, we assume that taskno(j)can satisfy Equation (15). That is
Eo(j)(G)=∑x=1j−1E(no(x),upr(o(x)),fpr(o(x)),hz(o(x)))+E(no(j))+∑y=j+1|N| Epre(no(y))=∑x=1jE(no(x),upr(o(x)),fpr(o(x)),hz(o(x)))+∑y=j+1|N| Epre(no(y))≤Egiven(G).
The above formulation can be written as
∑x=1jE(no(x),upr(o(x)),fpr(o(x)),hz(o(x)))≤Egiven(G)−∑y=j+1|N| Epre(no(y)).
Next, we prove taskno(j+1)can satisfy Equation (15). By Equation (30), we can have
Eo(j+1)(G)=∑x=1jE(no(x),upr(o(x)),fpr(o(x)),hz(o(x)))+E(no(j+1))+∑y=j+2|N| Epre(no(y))≤Egiven(G)−∑y=j+1|N| Epre(no(y))+E(no(j+1))+∑y=j+2|N| Epre(no(y))=Egiven(G)+E(no(j+1))−Epre(no(j+1)).
From Equations (28) and (30), we can have
Epre(no(j+1))=min{Ewa(no(j+1)),Emax(no(j+1))}≥Emin(no(j+1)).
Therefore, at least whenE(no(j+1))=Emin(no(j+1)), we can have
Eo(j+1)(G)≤Egiven(G)+E(no(j+1))−Epre(no(j+1))≤Egiven(G).
In summary, given an application G, if the unscheduled tasks are pre-allocated energy consumption according to our method, then each tasknjcan satisfy Equation (15). The feasibility of our method has been proved.
4.2. The Proposed Algorithm for Minimizing Schedule Length In this section, we show our new task scheduling algorithm in Algorithm 1. In the algorithm, Line 1 is to prioritize tasks in the input application; Lines 2–10 calculate some required values for each task, each level and the application G; Lines 11 and 12 calculate the pre-allocation energy consumption of each task; Lines 13–26 are to select processor and frequency for each task; Lines 27 and 28 are to calculate the actual energy consumption E(G) and the final schedule length SL(G).
Algorithm 1. A new scheduling algorithm for minimizing schedule length. |
Input: G = {N, W, C}, U,Egiven(G)Output: SL(G), E(G) 1: Sort tasks in a listtlby descending order ofranku; 2: for (∀i, ni∈N)do 3: ComputeEmin(ni)andEmax(ni); //By (10) (11) 4: ComputeEave(ni); //By (21) 5: for (∀l, 1≤l≤L(nexit))do 6: ComputeEave(Nl); //By (23) 7: ComputeEvar(Nl); //By (25) 8: ComputeEmin(G)andEmax(G); //By (12) (13) 9: ComputeEave(G); //By (22) 10: ComputeEvar(G); //By (26) 11: for (∀i, ni∈N)do 12: ComputeEpre(ni); //By (29) 13: while(tl≠∅)do 14: ni=tl.out(); 15: AFT(ni)=∞; 16: ComputeEgiven(ni); //By (18) 17: for (∀k, uk∈U)do 18: for (∀fk,h, fk,h∈[fk,low,fk,max])do 19: ComputeE(ni, uk, fk,h); //By (4) 20: if (E(ni, uk, fk,h)> Egiven(ni)) then 21: continue; 22: ComputeEFT(ni, uk, fk,h); //By (8) 23: if (EFT(ni, uk, fk,h)< AFT(ni)) then 24: Letupr(i)=ukandfpr(i),hz(i)=fk,h; 25: E(ni, upr(i), fpr(i),hz(i))= E(ni, uk, fk,h); 26: AFT(ni)= EFT(ni, upr(i), fpr(i),hz(i)); 27: Compute actual energy consumptionE(G); 28: Compute the schedule lengthSL(G); 29: return SL(G), E(G). |
For each task, selecting the processor with the minimum EFT has complexityO(|N|×|U|×|F|), where|F|represents the maximum number of discrete frequencies fromfk,lowtofk,max. Therefore, the complexity of Algorithm 1 isO(|N|2×|U|×|F|) the same as ISAECC in [23].
4.3. The Task Execution Frequency Re-Adjustment Mechanism
Through the new task energy pre-allocation method in Section 4.1 and the new task scheduling algorithm in Section 4.2, we can get the preliminary scheduling results. Other methods generally end here such as those in [20,21,22,23], but they do not realize that the scheduling results can be optimized to further shorten the schedule length. The same as the scheduling algorithms in [20,21,22,23], the algorithm in Section 4.2 makes tasks finish as soon as possible when scheduling them, which is not entirely reasonable. Premature completion of some tasks cannot shorten the overall schedule length, but will take up more energy consumption. Therefore, we introduce the concept of the latest finish time of tasks [28,39,40], which is defined as follows:
{LFT(nexit)=AFT(nexit)LFT(ni)=min{minnj∈succ(ni){AST(nj)−ci,j′},AST(ndn(i))},
wherendn(i)represents the downward neighbor task ofni, that is,ndn(i)is on the same processer core asniand it is the first case afterni.
Through Equation (32), we can replace AFT of tasks with LFT to delay the finish time of some tasks without increasing the schedule length. As the execution time of tasks is prolonged, their running frequency will be reduced accordingly, which can save some energy consumption. Therefore, the new execution time of taskniis changed fromAFT(ni)−AST(ni)toLFT(ni)−AST(ni), and the new frequency of tasknican be changed as follows:
fpr(i),nhz(i)=fpr(i),hz(i)×AFT(ni)−AST(ni)LFT(ni)−AST(ni).
The frequency range of taskniis[fpr(i),low,fpr(i),max], sofpr(i),nhz(i)should be
fpr(i),nhz(i)=max{fpr(i),hz(i)×AFT(ni)−AST(ni)LFT(ni)−AST(ni),fpr(i),low}.
Therefore, the actual execution time (AET) of taskniwill be
AET(ni)=wi,pr(i)×fpr(i),maxfpr(i),nhz(i).
Finally, the newAST(ni)should be updated as
ST(ni)=LFT(ni)−AET(ni)=LFT(ni)−wi,pr(i)×fpr(i),maxfpr(i),nhz(i).
On the basis of above equations, the algorithm to save energy after preliminary scheduling is shown in Algorithm 2. In Line 2, we reorder the tasks in descending order of the actual finish time (AFT) of tasks according to the scheduling results of Algorithm 1. In Lines 4–10,AFT(ni),AST(ni)andfpr(i),hz(i)are updated. In Line 11, we compute the newE(ni)using the above updated values. In Lines 12 and 13, we compute the saved energyEsave(G).
Algorithm 2. The method to save energy. |
Input: G = {N, W, C}, U,Egiven(G)Output:Esave(G) 1: Call Algorithm 1 to get the preliminary scheduling results; 2: Sort the tasks in the listalaccording to the descendingAFT(ni)order of values; 3: while(al≠∅)do 4: ni=tl.out(); 5: ComputeLFT(ni); //By (33) 6: Compute the new frequencyfpr(i),nhz(i); //By (34) 7: Compute the newAET(ni); //By (35) 8: UpdateAFT(ni)←LFT(ni); 9: UpdateAST(ni)←LFT(ni)−AET(ni); //By (36) 10: Updatefpr(i),hz(i)←fpr(i),nhz(i); 11: Compute the newE(ni); //By (4) 12: Compute the new energy consumption of GEnew(G); //By (5) 13: Compute the saved energyEsave(G)=E(G)−Enew(G); 14: returnEsave(G). |
Generally speaking, after the task scheduling is completed, there will be remaining energy consumption that is not used up as follows:
Erem(G)=Egiven(G)−E(G).
Therefore, the energy consumption that can be reused is
Ereu(G)=Erem(G)+Esave(G).
After calculating the energy consumption that can be reused, we need to re-allocate the energy consumption to the tasks that can directly affect the overall schedule length. Directly affecting the overall schedule length means that according to how much the task running time changes, the overall schedule length will also change. For example, taskn1 in Figure 1, the total schedule length will be shortened or extended as much as its execution time is shortened or extended. However, due to the diversity of task models, it is very difficult to find out which tasks can directly affect the overall schedule length. Therefore, we adopt a more direct way: If the application model level is strict (the tasks in level l only communicate with the tasks in their adjacent levels which are level l+1 and level l−1), we re-allocate the reused energy consumption (Ereu(G)) to the tasks that have not reached the highest execution frequency in the levels whose|N|=1; otherwise, we only re-allocate the reused energy consumption tonentryandnexit. For ease of description, we defineNdasas the set of tasks that can directly affect the overall schedule length.
After the assignment object ofEreu(G)is determined, we need to determine the allocation proportion. We define the maximum energy consumption ofni∈Ndason processer coreupr(i)is
Emax(ni,upr(i))=(Ppr(i),ind+Cpr(i),ef×(fpr(i),max)mpr(i))×wi,pr(i).
Therefore, the growable energy consumption ofni∈Ndason processer coreupr(i)will be
Egro(ni)=Emax(ni,pr(i))−E(ni).
We take the values ofEgro(ni)as the allocation proportion ofEreu(G). Therefore, the reused energy consumption ofni∈Ndaswill be
Ereu(ni)=Ereu(G)×Egro(ni)∑i=1|N| Egro(ni).
AddingEreu(ni)andE(ni)which is computed in Algorithm 2, we can get the energy consumption thatni∈Ndascan use will be
Euse(ni)=Ereu(ni)+E(ni).
According toEuse(ni), we can find the new frequencyfpr(i),nhofni∈Ndasby traversing the execution frequency on processor coreupr(i). Then we can compute the shortened actual execution time ofni∈Ndasas follows:
AETshor(ni)=wi,pr(i)×fpr(i),max×(1fpr(i),h−1fpr(i),nh).
Therefore, the new length of the application G will be
SLnew(G)=SL(G)−∑ni∈NdasAETshor(ni).
Combined with the Algorithms 1 and 2, the task execution frequency re-adjustment mechanism is shown in Algorithm 3. Lines 1 and 2 call Algorithms 1 and 2 to get the preliminary scheduling results andEsave(G). Line 3 compute the reused energy consumptionEreu(G). Lines 6 and 7 calculate some required values for each task belonging toNdas. Lines 8–14 are to select the new frequency for each task belonging toNdas. Finally, we compute the new schedule length of the application GSLnew(G)after the task execution frequency re-adjustment mechanism in Line 15. The value ofSLnew(G)is the final minimum schedule length we get.
In general, our method includes Algorithms 1–3. Algorithm 1 is a task energy pre-allocation strategy, and Algorithms 2 and 3 constitute the task execution frequency re-adjustment mechanism. For a given application, we operate in the order of Algorithms 1, Algorithm 2 and Algorithm 3, then we can get the scheduling result of minimizing the schedule length of the application.
Algorithm 3. The task execution frequency re-adjustment mechanism. |
Input: G = {N, W, C}, U,Egiven(G)Output: SL(G) 1: Call Algorithm 1 to get the preliminary scheduling results; 2: Call Algorithm 2 to getEsave(G); 3: ComputeEreu(G); //By (38) 4: for (∀ni,ni∈Ndas) do 5: E(ni)=0; 6: ComputeEgro(ni); //By (40) 7: ComputeEuse(ni); //By (42) 8: for (∀fpr(i),h, fpr(i),h∈[fpr(i),low,fpr(i),max])do 9: ComputeE(ni, upr(i), fpr(i),h); //By (4) 10: if (E(ni, upr(i), fpr(i),h)> Euse(ni)) then 11: continue; 12: if (E(ni, upr(i), fpr(i),h)>E(ni))then 13: LetE(ni)= E(ni, upr(i), fpr(i),h); 14: Let fpr(i),nh= fpr(i),h; 15: ComputeSLnew(G); //By (44) 16: UpdateSL(G)←SLnew(G); 17: returnSL(G). |
In this section, we use four algorithms, MSLECC [20], WALECC [21], EECC [22] and ISAECC [23], which are the same as the goal of this article to compare with our proposed method. The configuration of the experimental platform is AMD Ryzen 5 2500U CPU @ 2.00 GHz, 8 GB RAM (Santa Clara, CA, USA), 64-bit Windows 10 Home Edition. The whole set of codes is mainly implemented by C and scripts. The final schedule length SL(G) is the only evaluation standard of these algorithms.
The parameters of the processors and applications as follows:10 ms≤wi,k≤100 ms,10 ms≤ci,j≤100 ms,0.03≤Pk,ind≤0.07,0.8≤Ck,ef≤1.2,2.5≤mk≤3.0, andfk,max=1GHz. The execution frequency is discrete, and the precision is 0.01 GHz. The simulated heterogeneous platform for testing the problem of minimizing the schedule length uses four processor cores.
We chose two DAG models to evaluate our algorithm, which are two real-world applications (Fast Fourier transform and Gaussian elimination). 5.1. Fast Fourier Transform Application
We first consider the fast Fourier transform (FFT), Figure 2a shows an example of the FFT parallel application withρ=4. The parameterρcan represent the size of application models. For FFT application, the number of tasks is|N|=(2×ρ−1)+ρ×log2ρ, andρ=2xwherex is an integer. We can see that there are four exit tasks (task 12, 13, 14, 15) in the FFT graph in Figure 2a, and there will beρexit tasks in the FFT graph with parameterρ . In order to match the application model in Section 3.1, we add a dummy exit task whose execution time is 0. We also connect the dummy exit task to the lastρ exit tasks, and we set their communication time is 0. For example, Figure 2b shows the changed FFT parallel application withρ=4which is added a dummy exit task.
Experiment 1. In order to observe the performance on different energy consumption constraints, this experiment is carried out to compare the final schedule length values of the FFT application for varying energy consumption constraints. We use the FFT application withρ=32, that is, the number of tasks is 233 (|N|=233). We set the energy consumption constraintsEgiven(G)=Emin(G)+M233(Emax(G)−Emin(G)), where1≤M≤232andMis an integer.
Table 4 shows the details of the final schedule lengths of FFT application withρ=32for varyingEgiven(G) by using all the algorithms, and a more intuitive feeling can be performed through Figure 3. It can be seen that our algorithm has the obvious advantage on the schedule lengthSL(G)compared to other algorithms. From the experimental results, we can get that our method outperforms MSLECC by about 28.13%~36.35%, and it outperforms the newest method ISAECC by about 3.65%~10.48%. The results of WALECC and EECC are similar to that of ISAECC.
Further, we can find that the gaps between our method and other algorithms increase whenEgiven(G)decreases. This is because all tasks can be assigned to a relatively large energy consumption constraint when the energy consumption constraint is large, so that the impact of task level is smaller. Moreover, at this time, most tasks belonging toNdashave reached the maximum frequency, and cannot continue to increase the frequency to shorten the schedule length. In addition, as we expected, the largerEgiven(G)is, the shorter schedule length we can obtain.
Experiment 2. In order to observe the algorithm performance under different number of tasks, an experiment is carried out to compare the final schedule length values of the FFT application for varying number of tasks. The parameterρis changed from 8 to 256. In order to get relatively obvious results, we set the energy consumption constraints to a relatively small value:Egiven(G)=Emin(G)+Emax(G)−Emin(G)8.
Table 5 shows the results of FFT applications for different number of tasks by using all the algorithms, and a more intuitive feeling can be performed through Figure 4. The results show that our method has better performance than other algorithms. Our method outperforms MSLECC by about 21.24%–31.10%, and outperforms ISAECC by about 4.80%–7.32%. From the results, we can find that the more tasks we have, the better our method performs. This is because that there will be more different hierarchies in FFT graphs as the number of tasks increases, so that our new energy pre-allocation strategy will become more advantageous.
5.2. Gaussian Elimination Application
Similarly, in Gaussian elimination (GE) application, we defineρas the size of the application, and the number of tasks can be calculated by|N|=ρ2+ρ−22 . Figure 5 shows the GE application model withρ=5.
Experiment 3. This experiment compares the final schedule length values of GE application for varying energy consumption constraints. We use the GE application withρ=21, that is, the number of tasks is 230 (|N|=230). We set the energy consumption constraintsEgiven(G)=Emin(G)+M230(Emax(G)−Emin(G)), where1≤M≤229andMis an integer.
Table 6 shows the results of the final schedule lengths of GE application withρ=21for varyingEgiven(G) by using all the algorithms, and a more intuitive feeling can be performed through Figure 6. We can see that our method still performs better than other algorithms, specifically, it outperforms MSLECC by about 14.69%–34.55%, and outperforms ISAECC by about 3.94%–9.60%, but the improvement is not obvious compared with FFT models. This is because that the level of GE application is not as strict as the FFT application, so that the effect of our method has diminished.
Experiment 4. This experiment compares the final schedule length values of GE application for varying number of tasks. We set thatEgiven(G)=Emin(G)+Emax(G)−Emin(G)4and change the number of tasks.
Table 7 shows the results of the final schedule lengths of GE application for varying number of tasks by using all the algorithms, and a more intuitive feeling can be performed through Figure 7. Our method still has a better effect on the schedule length than other algorithms which performs better 16.00%–28.85% than MSLECC and 3.36%–10.98% than ISAECC.
5.3. Analysis and Summary of Experimental Results
We can obtain that our method has better performance compared with the other algorithms from the experimental results. However, whenEgiven(G)becomes larger or the number of tasks is relatively small, the advantage of our method is not particularly obvious. The former is because that all tasks can be allocated to a relatively large energy consumption constraint whenEgiven(G)is large, so that the impact of task level will be smaller. The latter is because that the experimental results are more accidental and cannot reflect the general law when the number of tasks is small. We can also find that our method performs better on FFT models than on GE models with similarEgiven(G)and the number of tasks. This is because that the task level of GE application is not as strict as the FFT application, so that the effect of our method has diminished. In summary, our method is more applicable when the energy constraints are stringent, the number of tasks is large, or the task level of the application is strict.
6. Conclusions In this article, we propose a novel method to minimize the scheduling length of energy-constrained applications which run on a heterogeneous multi-core system. Our method mainly includes two parts: a novel task energy pre-allocation strategy and the schedule algorithm based on it; a re-adjustment mechanism of task execution frequency after preliminary scheduling. The core idea of our method is that the tasks in different hierarchies have different impacts on the whole application and the negative impact of local optimal scheduling should be reduced. Our method can be integrated into actual multi-core embedded systems, and it is particularly suitable for wearable devices, mobile robots and other products with high requirements for energy saving and performance. We carry out a considerable number of experiments with two practical parallel application models (FFT and GE). The results of experiments show that our method is generally superior to other existing algorithms. However, the experimental results also demonstrate the limitations of our method, which are that our method does not offer much of an advantage when the energy constraints are not stringent, the number of tasks is small or the task level of the application is not strict. In the future, we will improve and extend our method, and some further studies will be done. The points that can be further studied are as follows:
- Consider other factors that affect the length of application scheduling, such as the way to determine the priority of tasks.
- Explore ways to improve the limitations of our method and make it more universal.
- Integrate our method into an actual embedded multi-core system and test its performance.
- Extend our method to study other indicators in multi-core task scheduling, such as reliability.
Notation | Description |
---|---|
wi,k | Execution time of tasknion the processor coreukwith the maximum frequency |
ci,j | Communication time fromnitonj |
pred(ni) | The set of direct predecessor tasks of taskni |
succ(ni) | The set of direct successor tasks of taskni |
nentry | Entry task of an application |
nexit | Exit task of an application |
E(ni,uk,fk,h) | The energy consumption of the tasknion the processor coreukwith the frequencyfk,h |
EST(ni,uk) | The earliest start time of tasknirunning on processor coreuk |
EFT(ni,uk,fk,h) | The earliest finish time of tasknirunning on processor coreukwith frequencyfk,h |
AST(ni) | The actual start time of taskni |
AET(ni) | The actual execution time of taskni |
AFT(ni) | The actual finish time of taskni |
LFT(ni) | The latest finish time of taskni |
L(ni) | The level of taskni |
upr(i) | The processor core allocated to taskni |
fpr(i),hz(i) | The execution frequency allocated to tasknion processor coreupr(i) |
Egiven(ni) | The calculated energy consumption constraint of taskni |
Epre(ni) | The pre-allocated energy consumption of taskni |
Egiven(G) | The given energy consumption constraint of application G |
E(G) | The energy consumption of application G |
SL(G) | The schedule length of application G |
Task | n1 | n2 | n3 | n4 | n5 | n6 | n7 | n8 | n9 | n10 | |
---|---|---|---|---|---|---|---|---|---|---|---|
Core | |||||||||||
u1 | 14 | 13 | 11 | 13 | 12 | 13 | 7 | 5 | 18 | 21 | |
u2 | 16 | 19 | 13 | 8 | 13 | 16 | 15 | 11 | 12 | 7 | |
u3 | 9 | 18 | 19 | 17 | 10 | 9 | 11 | 14 | 20 | 16 |
Task | n1 | n2 | n3 | n4 | n5 | n6 | n7 | n8 | n9 | n10 |
---|---|---|---|---|---|---|---|---|---|---|
ranku | 108 | 77 | 80 | 80 | 69 | 63.33 | 42.67 | 35.67 | 44.33 | 14.67 |
Egiven(G) | SL(G) | ||||
---|---|---|---|---|---|
MSLECC | WALECC | EECC | ISAECC | Our method | |
2086 | 10,989 | 8023 | 7955 | 7814 | 6846 |
2714 | 5915 | 4491 | 4564 | 4417 | 3887 |
3342 | 4644 | 3302 | 3487 | 3390 | 3108 |
4048 | 4052 | 2918 | 2937 | 2966 | 2779 |
5226 | 3381 | 2546 | 2609 | 2484 | 2388 |
6168 | 2947 | 2253 | 2315 | 2273 | 2190 |
ρ | SL(G) | ||||
---|---|---|---|---|---|
MSLECC | WALECC | EECC | ISAECC | Our Method | |
8 | 805 | 654 | 679 | 666 | 634 |
16 | 1619 | 1331 | 1358 | 1294 | 1227 |
32 | 3664 | 2960 | 2777 | 2814 | 2641 |
64 | 8433 | 6340 | 6314 | 6326 | 5922 |
128 | 18,954 | 13,892 | 13,975 | 14,131 | 13,168 |
256 | 43,524 | 31,982 | 32,814 | 32,354 | 29,987 |
Egiven(G) | SL(G) | ||||
---|---|---|---|---|---|
MSLECC | WALECC | EECC | ISAECC | Our Method | |
2436 | 6826 | 6144 | 5912 | 5973 | 5692 |
2763 | 6155 | 5389 | 5603 | 5398 | 5128 |
3665 | 5519 | 4457 | 4521 | 4324 | 4119 |
6535 | 4721 | 3776 | 3502 | 3418 | 3090 |
7191 | 3864 | 3093 | 3214 | 3186 | 3043 |
8749 | 3397 | 3012 | 3129 | 3017 | 2898 |
ρ | SL(G) | ||||
---|---|---|---|---|---|
MSLECC | WALECC | EECC | ISAECC | Our Method | |
8 | 1056 | 897 | 911 | 893 | 847 |
13 | 2185 | 1794 | 1805 | 1737 | 1599 |
21 | 3698 | 3269 | 3082 | 3214 | 3106 |
41 | 18,845 | 14,504 | 14,729 | 14,563 | 13,534 |
83 | 39,584 | 31,375 | 31,551 | 31,422 | 30,032 |
100 | 59,951 | 46,210 | 48,238 | 47,916 | 42,657 |
Author Contributions
Conceptualization, M.J. and X.J.; methodology, K.H., M.J. and X.J.; software, M.J.; validation, K.H., M.J. and S.C.; formal analysis, M.J. and X.J.; investigation, M.J., S.C., X.L. and W.T.; resources, K.H., X.J. and Z.L.; data curation, M.J., X.L. and W.T.; writing-original draft preparation, M.J.; writing, review and editing, K.H., M.J., X.J. and D.X.; visualization, K.H.; supervision, K.H. and X.J.; project administration and funding acquisition, K.H. All authors have read and agreed to the published version of the manuscript.
Funding
This work was supported by the National Key R&D Program of China (2020YFB0906000, 2020YFB0906001).
Conflicts of Interest
The authors declare no conflict of interest.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2020. This work is licensed under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Minimizing the schedule length of parallel applications, which run on a heterogeneous multi-core system and are subject to energy consumption constraints, has recently attracted much attention. The key point of this problem is the strategy to pre-allocate the energy consumption of unscheduled tasks. Previous articles used the minimum value, average value or a power consumption weight value as the pre-allocation energy consumption of tasks. However, they all ignored the different levels of tasks. The tasks in different task levels have different impact on the overall schedule length when they are allocated the same energy consumption. Considering the task levels, we designed a novel task energy consumption pre-allocation strategy that is conducive to minimizing the scheduling time and developed a novel task schedule algorithm based on it. After getting the preliminary scheduling results, we also proposed a task execution frequency re-adjustment mechanism that can re-adjust the execution frequency of tasks, to further reduce the overall schedule length. We carried out a considerable number of experiments with practical parallel application models. The results of the experiments show that our method can reach better performance compared with the existing algorithms.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer