Full text

Turn on search term navigation

1. Introduction

Computer systems nowadays must perform much better than ever before, ensuring the simultaneous running of many applications. By using heterogeneous multi-core processors and increasing the number of processor cores, it is possible to improve the performance while keeping energy consumption at the bay [1,2,3,4,5]. From small, embedded devices to large data centers, heterogeneous multi-core systems have been widely used. It is expected that in the near future, the number of heterogeneous processors and cores in these systems will increase dramatically [6,7,8]. On the other hand, although the performance of such systems has been greatly improved, the power consumption is also increasing. Huge energy consumption has caused various problems, such as economy, environment, technology and so on [9,10,11,12,13]. Therefore, energy consumption is one of the main design constraints for such heterogeneous multi-core systems. A well-known typical mechanism for reducing power consumption of computing systems is dynamic voltage and frequency scaling (DVFS), which is realized to achieve the balance between energy consumption and performance by reducing the power supply voltage and frequency of the processor at the same time, when the task is running [1,14,15,16,17,18,19]. Therefore, energy consumption constrained task scheduling on processors with adjustable voltage and frequency has attracted extensive research [20,21,22,23].

In the existing research, works related to the energy consumption of heterogeneous multi-core processors, some of the objectives are to minimize the energy consumption under certain performance requirements. Others are to minimize or maximize other indicators, such as performance and reliability, under certain energy consumption constraints [24,25,26,27,28,29,30,31]. Our studies fall into the second category. To be more precise, we study the following problem: minimizing the schedule length of energy consumption constrained parallel applications on heterogeneous multi-core systems. There have been some studies in recent years that are consistent with our goals. In order to determine the energy consumption constraint of each task, the common method in previous studies is to pre-allocate the energy consumption of the unscheduled tasks [20,21,22,23]. Therefore, these studies focus on how to allocate the overall energy consumption constraints to each task reasonably. Xiao et al. [20] came up with an algorithm called MSLECC (minimizing schedule length of energy consumption constrained). It takes the minimum energy consumption of a task as its pre-allocation energy consumption. The drawback of this method is that it is unfair to low priority tasks. High priority tasks take up too much energy consumption, so that low priority tasks can only be allocated to low-power processor cores and close to the lowest execution frequency due to the energy consumption constraint, which will lead to an increase in the overall schedule length. After that, some studies improved the energy pre-allocation method, so that the energy pre-allocation is fair to each task. One study pre-allocated the energy consumption according to the task execution time proportion [21], another study used the average energy consumption as the pre-allocation energy consumption of each task [22], and another study used a defined task energy consumption weight value as the pre-allocation energy consumption of tasks, whose algorithm is called ISAECC [23]. However, it is not the best practice to treat each task fairly and the task level is an important issue when pre-allocating energy consumption to the tasks. We pre-allocate more energy consumption to the tasks in the task levels with fewer tasks, because when their execution time becomes shorter, the overall scheduling length is more likely to decrease. In addition, previous articles ignored the negative impact of local optimal scheduling algorithm. Therefore, we develop a task execution frequency re-adjustment mechanism that also uses DVFS to further reduce the schedule length. To summarize, the main contributions of this article are as follows.

We design a novel energy pre-allocation strategy considering task level and prove its feasibility.
We develop a novel scheduling algorithm to minimize the schedule length under energy consumption constraints based on the new energy pre-allocation strategy.
We introduce a frequency re-adjustment mechanism after task scheduling to reduce the negative impact of local optimization.
We evaluate our algorithm based on real parallel applications. The experimental results consistently prove the superiority and competitiveness of our algorithm.

The structure of this article is as follows. Section 2 reviews some existing studies that are relevant to us. Section 3 gives some preliminaries related to the problem of minimizing the schedule length for energy consumption constrained parallel applications. In Section 4, we present our approach for this problem. In Section 5, we discuss and analyze the experimental results. Finally, we conclude the paper in Section 6.

2. Related Work

Energy saving design technology based on DVFS was first proposed by [1]. Nowadays, DVFS has been widely used in multi-core task scheduling problems related to energy consumption. Reference [2] studied the problem of minimizing the schedule length of independent sequential applications with energy consumption constraints. In Reference [29], the task scheduling problem with energy consumption constraints is considered as a combinatorial optimization problem. In Reference [31], the authors consider three constraints (energy consumption, deadline and reward). These studies are mainly focused on homogeneous systems, so they are different from our study.

In addition to the above research, many researchers have studied the task scheduling problem on heterogeneous multi-core systems. For example, reference [32] proposed an energy-saving workflow task scheduling algorithm based on DVFS. Huang et al. [33] proposed an enhanced energy-saving scheduling algorithm to minimize energy consumption under the condition of satisfying a certain performance level. Rusu et al. [31] added constraints and proposed an efficient algorithm for minimizing energy consumption under multiple constraints. The goal of these studies is generally contrary to ours. We study the minimization of the schedule length under energy consumption constraints, while those studies focus on minimizing energy consumption under other constraints.

There are also a lot of excellent studies that are closely related to our study. For example, a representative paper proposed the classical Heterogeneous Earliest Finish Time (HEFT) algorithm, which was developed to minimize the schedule length in heterogeneous multicore systems. The application model and energy consumption model they use are consistent with ours, but they do not consider the constraints of energy consumption. At present, the studies close to us should be the four articles mentioned in Section 1 [20,21,22,23]. They pre-allocate energy consumption to each task according to the minimum energy consumption of the task [20], the execution time ratio of the task [21], the overall average energy consumption [22], or the defined task energy consumption weight [23]. What is not considered in the above studies is that different task hierarchies have different impacts on the overall schedule length. Moreover, they ignored the negative impact of the local optimal characteristics of scheduling algorithm on the schedule length. Based on this, we propose a novel energy pre-allocation strategy considering task level, and, in order to reduce the negative impact of local optimal characteristics, we develop a scheduling task execution frequency re-adjustment mechanism. Finally, our method achieves better performance than previous studies.

3. Models and Preliminaries

In this section, we first introduce the application model (Section 3.1), then the energy consumption model (Section 3.2), and next we describe in detail the issues that need to be addressed (Section 3.3). Finally, we briefly introduce the current situation and reveal its limitations (Section 3.4). Table 1 shows the main notations we use.

3.1. Application Model

As in previous studies [20,21,22,23,24,34], we also use directed acyclic graph (DAG) to represent parallel application models. As for the processor cores, we defineU={_u1,_u2,…,_uk,…,_u|U|}to represent a collection of processor cores, where|U|is defined as the number of processor cores. Note that for any setX, we use|X|to denote its size. We define the DAG application model asG={N,W,C}.N={_n1,_n2,…,_ni,…,_n|N|}represents the set of nodes in the graph, that is, the set of tasks in the application. Due to the heterogeneous nature of the processor, the execution time of_ni∈Non different processor cores is different.Wrefers to a matrix with size|N|×|U|, where_wi,kdenotes the execution time for_nito run on_ukwith the maximum frequency.Cdenotes the weight of edges between connected nodes in DAG, that is, the communication time between tasks._ci,j∈Crepresents the communication time from_nito_nj. If_ci,j=0, it means there is no communication from_nito_nj. We definepred(_ni)andsucc(_ni)as the set of direct predecessor tasks and the set of direct successor tasks of task_ni. For example,pred(_n2)={_n1}andsucc(_n2)={_n8,_n9}. We define_nentryand_nexit as the entry task and the exit task of an application. In Figure 1,_nentryand_nexitare_n1and_n10.

Figure 1 shows an example of a parallel application based on DAG with ten tasks. Each node in Figure 1 represents a task, and the values on the edges between connecting nodes represent the communication time between the two nodes if they are not assigned to the same processor core. For example, the value 18 on the edge between_n1and_n2indicates that the communication time between_n1and_n2is 18.

Assuming that there are three heterogeneous processor cores{_u1,_u2,_u3} in the system, Table 2 shows the execution time of the tasks in Figure 1 running on each processor core with the maximum frequency. For example, the first number 14 in Table 2 indicates that the execution time of_n1running on_u1with the maximum frequency is 14.

3.2. Energy Model

In DVFS technology, the relationship between supply voltage and operating frequency is almost linear. Therefore, DVFS will also adjust the supply voltage when adjusting the clock frequency. Similar to [20,21,22,23], we use frequency regulation to indicate simultaneous regulation of supply voltage and frequency. In this article, we use the same energy model as the references [20,21,22,23]. Therefore, the calculation formula of system power consumption with respect to frequency is as follows:

P(f)=_Ps+h(_Pind+_Pd)=_Ps+h(_Pind+_Cef ^fm).

In the above equation,_Psdenotes static power and can only be removed when the system is completely powered down._Pindis a constant that represents the frequency-independent dynamic power, that is, it corresponds to power independent of CPU processing speed._Pddenotes frequency-dependent power, including the power primarily consumed by the CPU and any power that depends on the system processing frequencyf.hdenotes the system state, specifically,h=1means the system is active and the application is executing;h=0means the system is in the sleep mode or powered down._Cefdenotes the effective capacitance andmdenotes the dynamic power exponent and is no smaller than 2._Cefandmare constants related to the processor system.

Our study is in the active state of the system (h=1 ), so dynamic power consumption is the main part of the whole energy consumption. Considering the unmanageability of static power consumption, this article, like references [20,21,22,23], does not consider static power consumption. Therefore, the calculation formula of system power consumption in this article becomes the following equation:

P(f)=_Pind+_Cef ^fm.

Due to the heterogeneity of processors, each processor should have its own parameters. Assuming that the frequency range of the processor_ukis from the lowest frequency_fminto maximum frequency_fmaxtemporarily, we define the following sets of parameters:

The set of_Pind:{_P1,ind,_P2,ind,…,_P|U|,ind};
The set of_Pd:{_P1,d,_P2,d,…,_P|U|,d};
The set of_Cef: {_C1,ef,_C2,ef,…,_C|U|,ef};
The set ofm:{_m1,_m2,…,_m|U|};

The set of execution frequency:

{{_f1,min,_f1,a,…,_f1,max},{_f2,min,_f2,a,…,_f2,max},…,{_f|U|,min,_f|U|,a,…,_f|U|,max}}.

The execution time of the task_nion the processor core_ukwith the frequency_fk,hcan be obtained by the following equation:

_wi,k,h=_wi,k×_fk,max_fk,h.

Then the energy consumptionE(_ni, _uk, _fk,h)of the task_nion the processor core_ukwith the frequency_fk,hcan be obtained by the following equation:

E(_ni, _uk, _fk,h)=(_Pk,ind+_Ck,ef×^(_fk,h)_mk)×_wi,k×_fk,max_fk,h.

Therefore, the energy consumption of application G will be

E(G)=_∑i=1|N|E(_ni,_upr(i),_fpr(i),hz(i)).

As a result of the_Pind,Eis not monotonic withfand lessf does not always result less energy. Therefore, we can get the minimum value of energy-effective frequency by finding the minimum value of Equation (4). Similar to [20,21,22,23], we define the minimum value of energy-effective frequency as_fee. After calculation, we can get that

_fee=_Pind(m−1)_Cefm.

When the execution frequency is less than_fee, it is meaningless to continue to reduce the frequency, because this will increase energy consumption. Therefore, the range of execution frequency variation is[_flow,_fmax], where_flow=max(_fmin,_fee). The new set of execution frequency becomes as follows:

{{_f1,low,_f1,a,…,_f1,max},{_f2,low,_f2,a,…,_f2,max},…,{_f|U|,mi,_f|U|,a,…,_f|U|,max}}.

3.3. Preliminaries

3.3.1. Problem Description

The problem to be solved in this study is to assign a suitable frequency and processor core to each task, and minimize the schedule length of the application under the condition that the energy consumption of the application does not exceed the energy consumption constraint [35,36,37,38].

First, we define the earliest start time (EST) and the earliest finish time (EFT) of tasks. Given a task_niexecuted on processor_uk, its earliest start time (EST) is denoted asEST(_ni,_uk), which is computed as

EST(_ni,_uk)=max(avail[k],max_nj∈pred(_ni){AFT(_nj)+_ci,j′}).

whereavail[k]is the earliest available time of processor core_uk, that is, all tasks executed on processor core_ukhave been completed, and processor core_ukis ready to execute new tasks.AFT(_nj)is the actual finish time of task_nj._ci,j′is the actual communication time between task_niand_nj. If_niand_njare assigned to the same processor core,_ci,j′=0; otherwise,_ci,j′=_ci,j.

The earliest finish time (EFT) of task_niexecuted on processor_ukwith frequency_fk,his the earliest start time plus the execution time of task_ni, which is computed as

EFT(_ni,_uk,_fk,h)=EST(_ni,_uk,_fk,h)+_wi,k×_fk,max _fk,h.

We define SL(G)as the schedule length of application, where

SL(G)=AFT(_nexit).

We define_Egiven(G)as the given energy consumption constraint of application G. Therefore, the problem to solve can be expressed as minimizingSL(G)while

E(G)=_∑i=1|N|E(_ni,_upr(i),_fpr(i),hz(i))≤_Egiven(G).

where_upr(i)denotes the processor core assigned to the task_ni, and_fpr(i),hz(i)denotes the execution frequency assigned to the task_ni.

3.3.2. Effective Range of Energy Consumption Constraint

Since the execution time of each task on each processor core is known, we can obtain the minimum and maximum energy consumption of_nirepresented by_Emin(_ni)and_Emax(_ni)respectively by traversing all processors._Emin(_ni)and_Emax(_ni)perform task_niat minimum and maximum frequencies, respectively. The equations are as follows:

_Emin(_ni)=min_uk∈UE(_ni,_uk,_fk,min),

_Emax(_ni)=max_uk∈UE(_ni,_uk,_fk,max).

Therefore, the minimum and maximum energy consumption of applicationGcan be computed as follows:

_Emin(G)=_∑i=1|N| _Emin(_ni),

_Emax(G)=_∑i=1|N| _Emax(_ni).

It should be noted that the given energy consumption constraint has a reasonable range. If_Egiven(G)<_Emin(G), the energy consumption constraint can never be satisfied; if_Egiven(G)>_Emax(G), the energy constraint can always be met. Both of the above situations are unreasonable, so the reasonable range of the given energy consumption constraint is_Emin(G)≤_Egiven(G)≤_Emax(G).

3.3.3. Task Priority Determination

Before scheduling, we need to determine the priority of tasks. Similar to [20,21,22,23], we use the upward rank value (ran_ku) as the criterion to determine the priority of tasks.ran_kuis defined as follows:

ran_ku(_ni)=_∑k=1|U| _wi,k|U|+max_ni∈succ(_ni){_ci,j+ran_ku(_nj)}.

The priority of tasks is sorted in descending order ofran_ku, that is, the higherran_ku of a task, the higher the priority of it. Table 3 shows the upward rank values of all the tasks in Figure 1. Therefore, the task priority list of the application in Figure 1 will be{_n1,_n3,_n4,_n2,_n5,_n6,_n9,_n7,_n8,_n10}.

3.4. The ISAECC Method In this subsection, we review the existing method that is closest to us and reveal its limitations.

3.4.1. Method Description

The ISAECC method is proposed in [23], and it consists of several major steps:

It prioritizes tasks by using the upward rank value which is defined in Section 3.3.3;
It uses a self-defined energy consumption weight value to give each unscheduled task a pre-allocated energy consumption;
It calculates the energy consumption constraint of each task according to the given energy consumption constraint of the whole application and the pre-allocated energy consumption value of each task;
According to the order of the task priority list, it assigns each task the processor core and frequency that can minimize its EST time by traversing each processor core and optional execution frequency.

For the sake of generality, we use{_no(1),_no(2),…,_no(|N|)}to represent the task priority order. Assuming that the currently scheduled task is_no(j), then the scheduled task set is{_no(1),_no(2),…,_no(j−1)}and the unscheduled task set is{_no(j+1),_no(j+2),…,_no(|N|)}. Therefore, when scheduling the task_no(j), the overall energy consumption of application G can be expressed as

_Eo(j)(G)=_∑x=1j−1E(_no(x),_upr(o(x)),_{fpr(o(x)),hz(o(x))})+E(_no(j))+_∑y=j+1|N| _Epre(_no(y)),

where_Epre(_no(y))denotes the pre-allocation energy consumption of task_no(y).

According to the energy consumption constraints shown in Equation (8), we can get

_Eo(j)(G)≤_Egiven(G).

Therefore, we can get

E(_no(j))≤_Egiven(G)−_∑x=1j−1E(_no(x),_upr(o(x)),_{fpr(o(x)),hz(o(x))})−_∑y=j+1|N| _Epre(_no(y)).

Let the energy consumption constraint of task_no(j)be

_Egiven(_no(j))=_Egiven(G) −_∑x=1j−1E(_no(x),_upr(o(x)),_{fpr(o(x)),hz(o(x))})−_∑y=j+1|N| _Epre(_no(y)).

With Equation (17), we only need to consider the energy consumption constraint of each task which is shown as follows:

E(_no(j))≤min{_Egiven(_no(j)),_Emax(_no(j))}.

Therefore, the key problem is how to determine the pre-allocation energy consumption (_Epre(_no(j))) of each task. The central idea of the method ISAECC used is to pre-allocate the energy consumption for unscheduled tasks by a weight mechanism. First, they define the improvable energy of application G called_Eie(G), which is computed as

_Eie(G)=_Egiven(G)−_Emin(G).

Then, they define_Eave(_ni)and_Eave(G)as the energy consumption level of task_niand the energy consumption level of application G._Eave(_ni)and_Eave(G)are computed as

_Eave(_ni)=_Emin(_ni)+_Emax(_ni)2,

_Eave(G)=_Emin(G)+_Emax(G)2.

Next, they defineel(_ni)as the weight of energy consumption level of task_ni, which is computed as

el(_ni)=_Eave(_ni)_Eave(G).

After that, they calculated the pre-allocated energy consumption for task_nias follows:

_Epre(_ni)=min{_Ewa(_ni),_Emax(_ni)}

where_Ewa(_ni)is computed as

_Ewa(_ni)=_Eie(G)×el(_ni)+_Emin(_ni).

After determining the pre-allocated energy consumption of each task, the task scheduling can be completed according to steps 3 and 4, described at the beginning of this section (Section 3.4.1).

3.4.2. Limitations of ISAECC

ISAECC solves the problem of increasing schedule length caused by unfair energy constraint allocation of low priority tasks by MSLECC in [20]. However, we find that it is not the best practice to treat each task fairly, because tasks in different levels have different impacts on the whole application. For example, in Figure 1, the task_n1and task_n10should be assigned more energy than other tasks, because their execution time can affect the overall schedule length of the application G directly. It is obvious that if the execution time of task_n1or task_n10is shortened or lengthened under other same conditions, the overall schedule length will shorten or lengthen the same amount accordingly. Other tasks like_n2cannot be compared to_n1and_n10. If the execution time of task_n2is shortened or lengthened under other same conditions, the schedule length of application G may not change obviously, or even not change at all. Therefore, we should consider the levels of the tasks and the number of tasks in each level when pre-allocating the energy consumption of tasks. In addition, previous studies ignored the negative impact of the local optimal characteristics of scheduling algorithm on the schedule length. In our design, these problems have been greatly improved.

4. Our Solution

In this section, we introduce our new strategy in detail. First, we introduce a new task energy pre-allocation method considering task level (Section 4.1). Then, we give a new task scheduling algorithm to minimize the schedule length under the constraint of energy consumption (Section 4.2). Finally, we describe the task execution frequency re-adjustment mechanism we added after getting the preliminary scheduling results (Section 4.3).

4.1. The New Task Energy Pre-Allocation Method

The central idea of our pre-allocation method is to consider the levels of tasks. The impact of tasks in different hierarchies on the overall schedule length of an application is different. For example, in Figure 1, if we shorten the execution time of task_n1by increasing its execution frequency, the schedule length of application G will certainly shorten the corresponding time; but if we shorten the execution time of task_n2, the schedule length of application G is unlikely to change much, because there are still many tasks in a similar position to it. Therefore, a more reasonable approach is to appropriately pre-allocate more energy consumption to task_n1 in the case of Figure 1. Based on this idea, we put forward a new energy pre-allocation method, based on the weight value of energy consumption and the levels of tasks.

4.1.1. Method Description

We define the level of tasks as follows:

{L(_nentry)=0L(_ni)=max_nx∈pred(_ni){L(_nx)+1}.

We define_Nl={_ni,_nj,…}as the set of tasks contained in level l, whereL(_ni)=L(_nj)=L(…)=l. The number of tasks in level l is|_Nl|. We can have that the maximum of the level of tasks isL(_nexit).

We define the improvable energy of application G (_Eie(G)) and the energy consumption level of task_ni(_Eave(_ni)) the same as ISAECC. We define that the energy consumption level of the task level l is the sum of energy consumption level in level l, which is computed as

_Eave(_Nl)= _{∑_ni∈_Nl} _Eave(_ni).

We defineel(_ni,l)as the energy consumption weight of task_niin its level l, which is computed as

el(_ni,l)=_Eave(_ni)_Eave(_Nl).

We define_Evar(_Nl)as the variation energy consumption of_Nl, which is computed as

_Evar(_Nl)=_Eave(_Nl)K, K={|_Nl|, |_Nl|<|U||U|, |_Nl|≥|U|.

Correspondingly, the variation energy consumption of application G is computed as

_Evar(G)=∑l_Evar(_Nl).

The energy consumption weight of_Nlin application G can be defined as

el(_Nl)=_Evar(_Nl)_Evar(G).

We define_Eie(_Nl)as the improvable energy of_Nl, which is computed as

_Eie(_Nl)= _Eie(G)×el(_Nl).

Therefore, we can get the new energy pre-allocation formula of_nias follows:

_Epre(_ni)=min{_Ewa(_ni),_Emax(_ni)},

where

_Ewa(_ni)=_Eie(_Nl)×el(_ni,l)+_Emin(_ni).

4.1.2. Feasibility of the Task Energy Pre-Allocation Mechanism

In order to prove the feasibility of our method, we need to prove the following theorem: Given an application G, if the unscheduled tasks are pre-allocated energy consumption according to our method, then each task_njcan satisfy Equation (15).

We use mathematical induction to prove the above theorem. First, we need to prove task_no(1)can satisfy Equation (15), and the other tasks are all unscheduled. By Equations (14), (22)–(29), we can have

_Eo(1)(G)=E(_no(1))+_∑y=2|N| _Epre(_no(y))≤E(_no(1))+_∑y=2|N| _Ewa(_no(y))= E(_no(1))+_∑y=1|N| _Ewa(_no(y))−_Ewa(_no(1))= E(_no(1))+_Egiven(G)−_Ewa(_no(1)).

We know that

_Ewa(_no(1))=_Eie(_Nl)×el(_ni,l)+_Emin(_ni)≥_Emin(_no(1)).

We can at least find a situation in which

E(_no(1))=_Emin(_no(1)).

In other words, at least whenE(_no(1))=_Emin(_no(1)), we can have

_Eo(1)(G)≤E(_no(1))+_Egiven(G)−_Ewa(_no(1))≤_Egiven(G).

From the above derivation, we prove that task_no(1)can satisfy Equation (15).

Then, we assume that task_no(j)can satisfy Equation (15). That is

_Eo(j)(G)=_∑x=1j−1E(_no(x),_upr(o(x)),_{fpr(o(x)),hz(o(x))})+E(_no(j))+_∑y=j+1|N| _Epre(_no(y))=_∑x=1jE(_no(x),_upr(o(x)),_{fpr(o(x)),hz(o(x))})+_∑y=j+1|N| _Epre(_no(y))≤_Egiven(G).

The above formulation can be written as

_∑x=1jE(_no(x),_upr(o(x)),_{fpr(o(x)),hz(o(x))})≤_Egiven(G)−_∑y=j+1|N| _Epre(_no(y)).

Next, we prove task_no(j+1)can satisfy Equation (15). By Equation (30), we can have

_Eo(j+1)(G)=_∑x=1jE(_no(x),_upr(o(x)),_{fpr(o(x)),hz(o(x))})+E(_no(j+1))+_∑y=j+2|N| _Epre(_no(y))≤_Egiven(G)−_∑y=j+1|N| _Epre(_no(y))+E(_no(j+1))+_∑y=j+2|N| _Epre(_no(y))=_Egiven(G)+E(_no(j+1))−_Epre(_no(j+1)).

From Equations (28) and (30), we can have

_Epre(_no(j+1))=min{_Ewa(_no(j+1)),_Emax(_no(j+1))}≥_Emin(_no(j+1)).

Therefore, at least whenE(_no(j+1))=_Emin(_no(j+1)), we can have

_Eo(j+1)(G)≤_Egiven(G)+E(_no(j+1))−_Epre(_no(j+1))≤_Egiven(G).

In summary, given an application G, if the unscheduled tasks are pre-allocated energy consumption according to our method, then each task_njcan satisfy Equation (15). The feasibility of our method has been proved.

4.2. The Proposed Algorithm for Minimizing Schedule Length In this section, we show our new task scheduling algorithm in Algorithm 1. In the algorithm, Line 1 is to prioritize tasks in the input application; Lines 2–10 calculate some required values for each task, each level and the application G; Lines 11 and 12 calculate the pre-allocation energy consumption of each task; Lines 13–26 are to select processor and frequency for each task; Lines 27 and 28 are to calculate the actual energy consumption E(G) and the final schedule length SL(G).

Algorithm 1. A new scheduling algorithm for minimizing schedule length.

Input: G = {N, W, C}, U,_Egiven(G)Output: SL(G), E(G) 1: Sort tasks in a listtlby descending order ofran_ku; 2: for (∀i, _ni∈N)do 3: Compute_Emin(_ni)and_Emax(_ni); //By (10) (11) 4: Compute_Eave(_ni); //By (21) 5: for (∀l, 1≤l≤L(_nexit))do 6: Compute_Eave(_Nl); //By (23) 7: Compute_Evar(_Nl); //By (25) 8: Compute_Emin(G)and_Emax(G); //By (12) (13) 9: Compute_Eave(G); //By (22) 10: Compute_Evar(G); //By (26) 11: for (∀i, _ni∈N)do 12: Compute_Epre(_ni); //By (29) 13: while(tl≠∅)do 14: _ni=tl.out(); 15: AFT(_ni)=∞; 16: Compute_Egiven(_ni); //By (18) 17: for (∀k, _uk∈U)do 18: for (∀_fk,h, _fk,h∈[_fk,low,_fk,max])do 19: ComputeE(_ni, _uk, _fk,h); //By (4) 20: if (E(_ni, _uk, _fk,h)> _Egiven(_ni)) then 21: continue; 22: ComputeEFT(_ni, _uk, _fk,h); //By (8) 23: if (EFT(_ni, _uk, _fk,h)< AFT(_ni)) then 24: Let_upr(i)=_ukand_fpr(i),hz(i)=_fk,h; 25: E(_ni, _upr(i), _fpr(i),hz(i))= E(_ni, _uk, _fk,h); 26: AFT(_ni)= EFT(_ni, _upr(i), _fpr(i),hz(i)); 27: Compute actual energy consumptionE(G); 28: Compute the schedule lengthSL(G); 29: return SL(G), E(G).

For each task, selecting the processor with the minimum EFT has complexityO(|N|×|U|×|F|), where|F|represents the maximum number of discrete frequencies from_fk,lowto_fk,max. Therefore, the complexity of Algorithm 1 isO(^|N|2×|U|×|F|) the same as ISAECC in [23].

4.3. The Task Execution Frequency Re-Adjustment Mechanism

Through the new task energy pre-allocation method in Section 4.1 and the new task scheduling algorithm in Section 4.2, we can get the preliminary scheduling results. Other methods generally end here such as those in [20,21,22,23], but they do not realize that the scheduling results can be optimized to further shorten the schedule length. The same as the scheduling algorithms in [20,21,22,23], the algorithm in Section 4.2 makes tasks finish as soon as possible when scheduling them, which is not entirely reasonable. Premature completion of some tasks cannot shorten the overall schedule length, but will take up more energy consumption. Therefore, we introduce the concept of the latest finish time of tasks [28,39,40], which is defined as follows:

{LFT(_nexit)=AFT(_nexit)LFT(_ni)=min{min_nj∈succ(_ni){AST(_nj)−_ci,j′},AST(_ndn(i))},

where_ndn(i)represents the downward neighbor task of_ni, that is,_ndn(i)is on the same processer core as_niand it is the first case after_ni.

Through Equation (32), we can replace AFT of tasks with LFT to delay the finish time of some tasks without increasing the schedule length. As the execution time of tasks is prolonged, their running frequency will be reduced accordingly, which can save some energy consumption. Therefore, the new execution time of task_niis changed fromAFT(_ni)−AST(_ni)toLFT(_ni)−AST(_ni), and the new frequency of task_nican be changed as follows:

_{fpr(i),nhz(i)}=_fpr(i),hz(i)×AFT(_ni)−AST(_ni)LFT(_ni)−AST(_ni).

The frequency range of task_niis[_fpr(i),low,_fpr(i),max], so_{fpr(i),nhz(i)}should be

_{fpr(i),nhz(i)}=max{_fpr(i),hz(i)×AFT(_ni)−AST(_ni)LFT(_ni)−AST(_ni),_fpr(i),low}.

Therefore, the actual execution time (AET) of task_niwill be

AET(_ni)=_wi,pr(i)×_fpr(i),max_{fpr(i),nhz(i)}.

Finally, the newAST(_ni)should be updated as

ST(_ni)=LFT(_ni)−AET(_ni)=LFT(_ni)−_wi,pr(i)×_fpr(i),max_{fpr(i),nhz(i)}.

On the basis of above equations, the algorithm to save energy after preliminary scheduling is shown in Algorithm 2. In Line 2, we reorder the tasks in descending order of the actual finish time (AFT) of tasks according to the scheduling results of Algorithm 1. In Lines 4–10,AFT(_ni),AST(_ni)and_fpr(i),hz(i)are updated. In Line 11, we compute the newE(_ni)using the above updated values. In Lines 12 and 13, we compute the saved energy_Esave(G).

Algorithm 2. The method to save energy.

Input: G = {N, W, C}, U,_Egiven(G)Output:_Esave(G) 1: Call Algorithm 1 to get the preliminary scheduling results; 2: Sort the tasks in the listalaccording to the descendingAFT(_ni)order of values; 3: while(al≠∅)do 4: _ni=tl.out(); 5: ComputeLFT(_ni); //By (33) 6: Compute the new frequency_{fpr(i),nhz(i)}; //By (34) 7: Compute the newAET(_ni); //By (35) 8: UpdateAFT(_ni)←LFT(_ni); 9: UpdateAST(_ni)←LFT(_ni)−AET(_ni); //By (36) 10: Update_fpr(i),hz(i)←_{fpr(i),nhz(i)}; 11: Compute the newE(_ni); //By (4) 12: Compute the new energy consumption of G_Enew(G); //By (5) 13: Compute the saved energy_Esave(G)=E(G)−_Enew(G); 14: return_Esave(G).

Generally speaking, after the task scheduling is completed, there will be remaining energy consumption that is not used up as follows:

_Erem(G)=_Egiven(G)−E(G).

Therefore, the energy consumption that can be reused is

_Ereu(G)=_Erem(G)+_Esave(G).

After calculating the energy consumption that can be reused, we need to re-allocate the energy consumption to the tasks that can directly affect the overall schedule length. Directly affecting the overall schedule length means that according to how much the task running time changes, the overall schedule length will also change. For example, task_n1 in Figure 1, the total schedule length will be shortened or extended as much as its execution time is shortened or extended. However, due to the diversity of task models, it is very difficult to find out which tasks can directly affect the overall schedule length. Therefore, we adopt a more direct way: If the application model level is strict (the tasks in level l only communicate with the tasks in their adjacent levels which are level l+1 and level l−1), we re-allocate the reused energy consumption (_Ereu(G)) to the tasks that have not reached the highest execution frequency in the levels whose|N|=1; otherwise, we only re-allocate the reused energy consumption to_nentryand_nexit. For ease of description, we define_Ndasas the set of tasks that can directly affect the overall schedule length.

After the assignment object of_Ereu(G)is determined, we need to determine the allocation proportion. We define the maximum energy consumption of_ni∈_Ndason processer core_upr(i)is

_Emax(_ni,_upr(i))=(_Ppr(i),ind+_Cpr(i),ef×^{(_fpr(i),max)_mpr(i)})×_wi,pr(i).

Therefore, the growable energy consumption of_ni∈_Ndason processer core_upr(i)will be

_Egro(_ni)=_Emax(_ni,pr(i))−E(_ni).

We take the values of_Egro(_ni)as the allocation proportion of_Ereu(G). Therefore, the reused energy consumption of_ni∈_Ndaswill be

_Ereu(_ni)=_Ereu(G)×_Egro(_ni)_∑i=1|N| _Egro(_ni).

Adding_Ereu(_ni)andE(_ni)which is computed in Algorithm 2, we can get the energy consumption that_ni∈_Ndascan use will be

_Euse(_ni)=_Ereu(_ni)+E(_ni).

According to_Euse(_ni), we can find the new frequency_fpr(i),nhof_ni∈_Ndasby traversing the execution frequency on processor core_upr(i). Then we can compute the shortened actual execution time of_ni∈_Ndasas follows:

AE_Tshor(_ni)=_wi,pr(i)×_fpr(i),max×(1_fpr(i),h−1_fpr(i),nh).

Therefore, the new length of the application G will be

S_Lnew(G)=SL(G)−_{∑_ni∈_Ndas}AE_Tshor(_ni).

Combined with the Algorithms 1 and 2, the task execution frequency re-adjustment mechanism is shown in Algorithm 3. Lines 1 and 2 call Algorithms 1 and 2 to get the preliminary scheduling results and_Esave(G). Line 3 compute the reused energy consumption_Ereu(G). Lines 6 and 7 calculate some required values for each task belonging to_Ndas. Lines 8–14 are to select the new frequency for each task belonging to_Ndas. Finally, we compute the new schedule length of the application GS_Lnew(G)after the task execution frequency re-adjustment mechanism in Line 15. The value ofS_Lnew(G)is the final minimum schedule length we get.

In general, our method includes Algorithms 1–3. Algorithm 1 is a task energy pre-allocation strategy, and Algorithms 2 and 3 constitute the task execution frequency re-adjustment mechanism. For a given application, we operate in the order of Algorithms 1, Algorithm 2 and Algorithm 3, then we can get the scheduling result of minimizing the schedule length of the application.

Algorithm 3. The task execution frequency re-adjustment mechanism.

Input: G = {N, W, C}, U,_Egiven(G)Output: SL(G) 1: Call Algorithm 1 to get the preliminary scheduling results; 2: Call Algorithm 2 to get_Esave(G); 3: Compute_Ereu(G); //By (38) 4: for (∀_ni,_ni∈_Ndas) do 5: E(_ni)=0; 6: Compute_Egro(_ni); //By (40) 7: Compute_Euse(_ni); //By (42) 8: for (∀_fpr(i),h, _fpr(i),h∈[_fpr(i),low,_fpr(i),max])do 9: ComputeE(_ni, _upr(i), _fpr(i),h); //By (4) 10: if (E(_ni, _upr(i), _fpr(i),h)> _Euse(_ni)) then 11: continue; 12: if (E(_ni, _upr(i), _fpr(i),h)>E(_ni))then 13: LetE(_ni)= E(_ni, _upr(i), _fpr(i),h); 14: Let _fpr(i),nh= _fpr(i),h; 15: ComputeS_Lnew(G); //By (44) 16: UpdateSL(G)←S_Lnew(G); 17: returnSL(G).

5. Experiments

In this section, we use four algorithms, MSLECC [20], WALECC [21], EECC [22] and ISAECC [23], which are the same as the goal of this article to compare with our proposed method. The configuration of the experimental platform is AMD Ryzen 5 2500U CPU @ 2.00 GHz, 8 GB RAM (Santa Clara, CA, USA), 64-bit Windows 10 Home Edition. The whole set of codes is mainly implemented by C and scripts. The final schedule length SL(G) is the only evaluation standard of these algorithms.

The parameters of the processors and applications as follows:10 ms≤_wi,k≤100 ms,10 ms≤_ci,j≤100 ms,0.03≤_Pk,ind≤0.07,0.8≤_Ck,ef≤1.2,2.5≤_mk≤3.0, and_fk,max=1GHz. The execution frequency is discrete, and the precision is 0.01 GHz. The simulated heterogeneous platform for testing the problem of minimizing the schedule length uses four processor cores.

We chose two DAG models to evaluate our algorithm, which are two real-world applications (Fast Fourier transform and Gaussian elimination). 5.1. Fast Fourier Transform Application

We first consider the fast Fourier transform (FFT), Figure 2a shows an example of the FFT parallel application withρ=4. The parameterρcan represent the size of application models. For FFT application, the number of tasks is|N|=(2×ρ−1)+ρ×_log2ρ, andρ=^2xwherex is an integer. We can see that there are four exit tasks (task 12, 13, 14, 15) in the FFT graph in Figure 2a, and there will beρexit tasks in the FFT graph with parameterρ . In order to match the application model in Section 3.1, we add a dummy exit task whose execution time is 0. We also connect the dummy exit task to the lastρ exit tasks, and we set their communication time is 0. For example, Figure 2b shows the changed FFT parallel application withρ=4which is added a dummy exit task.

Experiment 1. In order to observe the performance on different energy consumption constraints, this experiment is carried out to compare the final schedule length values of the FFT application for varying energy consumption constraints. We use the FFT application withρ=32, that is, the number of tasks is 233 (|N|=233). We set the energy consumption constraints_Egiven(G)=_Emin(G)+M233(_Emax(G)−_Emin(G)), where1≤M≤232andMis an integer.

Table 4 shows the details of the final schedule lengths of FFT application withρ=32for varying_Egiven(G) by using all the algorithms, and a more intuitive feeling can be performed through Figure 3. It can be seen that our algorithm has the obvious advantage on the schedule lengthSL(G)compared to other algorithms. From the experimental results, we can get that our method outperforms MSLECC by about 28.13%~36.35%, and it outperforms the newest method ISAECC by about 3.65%~10.48%. The results of WALECC and EECC are similar to that of ISAECC.

Further, we can find that the gaps between our method and other algorithms increase when_Egiven(G)decreases. This is because all tasks can be assigned to a relatively large energy consumption constraint when the energy consumption constraint is large, so that the impact of task level is smaller. Moreover, at this time, most tasks belonging to_Ndashave reached the maximum frequency, and cannot continue to increase the frequency to shorten the schedule length. In addition, as we expected, the larger_Egiven(G)is, the shorter schedule length we can obtain.

Experiment 2. In order to observe the algorithm performance under different number of tasks, an experiment is carried out to compare the final schedule length values of the FFT application for varying number of tasks. The parameterρis changed from 8 to 256. In order to get relatively obvious results, we set the energy consumption constraints to a relatively small value:_Egiven(G)=_Emin(G)+_Emax(G)−_Emin(G)8.

Table 5 shows the results of FFT applications for different number of tasks by using all the algorithms, and a more intuitive feeling can be performed through Figure 4. The results show that our method has better performance than other algorithms. Our method outperforms MSLECC by about 21.24%–31.10%, and outperforms ISAECC by about 4.80%–7.32%. From the results, we can find that the more tasks we have, the better our method performs. This is because that there will be more different hierarchies in FFT graphs as the number of tasks increases, so that our new energy pre-allocation strategy will become more advantageous.

5.2. Gaussian Elimination Application

Similarly, in Gaussian elimination (GE) application, we defineρas the size of the application, and the number of tasks can be calculated by|N|=^ρ2+ρ−22 . Figure 5 shows the GE application model withρ=5.

Experiment 3. This experiment compares the final schedule length values of GE application for varying energy consumption constraints. We use the GE application withρ=21, that is, the number of tasks is 230 (|N|=230). We set the energy consumption constraints_Egiven(G)=_Emin(G)+M230(_Emax(G)−_Emin(G)), where1≤M≤229andMis an integer.

Table 6 shows the results of the final schedule lengths of GE application withρ=21for varying_Egiven(G) by using all the algorithms, and a more intuitive feeling can be performed through Figure 6. We can see that our method still performs better than other algorithms, specifically, it outperforms MSLECC by about 14.69%–34.55%, and outperforms ISAECC by about 3.94%–9.60%, but the improvement is not obvious compared with FFT models. This is because that the level of GE application is not as strict as the FFT application, so that the effect of our method has diminished.

Experiment 4. This experiment compares the final schedule length values of GE application for varying number of tasks. We set that_Egiven(G)=_Emin(G)+_Emax(G)−_Emin(G)4and change the number of tasks.

Table 7 shows the results of the final schedule lengths of GE application for varying number of tasks by using all the algorithms, and a more intuitive feeling can be performed through Figure 7. Our method still has a better effect on the schedule length than other algorithms which performs better 16.00%–28.85% than MSLECC and 3.36%–10.98% than ISAECC.

5.3. Analysis and Summary of Experimental Results

We can obtain that our method has better performance compared with the other algorithms from the experimental results. However, when_Egiven(G)becomes larger or the number of tasks is relatively small, the advantage of our method is not particularly obvious. The former is because that all tasks can be allocated to a relatively large energy consumption constraint when_Egiven(G)is large, so that the impact of task level will be smaller. The latter is because that the experimental results are more accidental and cannot reflect the general law when the number of tasks is small. We can also find that our method performs better on FFT models than on GE models with similar_Egiven(G)and the number of tasks. This is because that the task level of GE application is not as strict as the FFT application, so that the effect of our method has diminished. In summary, our method is more applicable when the energy constraints are stringent, the number of tasks is large, or the task level of the application is strict.

6. Conclusions In this article, we propose a novel method to minimize the scheduling length of energy-constrained applications which run on a heterogeneous multi-core system. Our method mainly includes two parts: a novel task energy pre-allocation strategy and the schedule algorithm based on it; a re-adjustment mechanism of task execution frequency after preliminary scheduling. The core idea of our method is that the tasks in different hierarchies have different impacts on the whole application and the negative impact of local optimal scheduling should be reduced. Our method can be integrated into actual multi-core embedded systems, and it is particularly suitable for wearable devices, mobile robots and other products with high requirements for energy saving and performance. We carry out a considerable number of experiments with two practical parallel application models (FFT and GE). The results of experiments show that our method is generally superior to other existing algorithms. However, the experimental results also demonstrate the limitations of our method, which are that our method does not offer much of an advantage when the energy constraints are not stringent, the number of tasks is small or the task level of the application is not strict. In the future, we will improve and extend our method, and some further studies will be done. The points that can be further studied are as follows:

Consider other factors that affect the length of application scheduling, such as the way to determine the priority of tasks.
Explore ways to improve the limitations of our method and make it more universal.
Integrate our method into an actual embedded multi-core system and test its performance.
Extend our method to study other indicators in multi-core task scheduling, such as reliability.

Notation	Description
_wi,k	Execution time of task_nion the processor core_ukwith the maximum frequency
_ci,j	Communication time from_nito_nj
pred(_ni)	The set of direct predecessor tasks of task_ni
succ(_ni)	The set of direct successor tasks of task_ni
_nentry	Entry task of an application
_nexit	Exit task of an application
E(_ni,_uk,_fk,h)	The energy consumption of the task_nion the processor core_ukwith the frequency_fk,h
EST(_ni,_uk)	The earliest start time of task_nirunning on processor core_uk
EFT(_ni,_uk,_fk,h)	The earliest finish time of task_nirunning on processor core_ukwith frequency_fk,h
AST(_ni)	The actual start time of task_ni
AET(_ni)	The actual execution time of task_ni
AFT(_ni)	The actual finish time of task_ni
LFT(_ni)	The latest finish time of task_ni
L(_ni)	The level of task_ni
_upr(i)	The processor core allocated to task_ni
_fpr(i),hz(i)	The execution frequency allocated to task_nion processor core_upr(i)
_Egiven(_ni)	The calculated energy consumption constraint of task_ni
_Epre(_ni)	The pre-allocated energy consumption of task_ni
_Egiven(G)	The given energy consumption constraint of application G
E(G)	The energy consumption of application G
SL(G)	The schedule length of application G

	_n1	_n2	_n3	_n4	_n5	_n6	_n7	_n8	_n9	_n10
Core	_n1	_n2	_n3	_n4	_n5	_n6	_n7	_n8	_n9	_n10
_u1	14	13	11	13	12	13	7	5	18	21
_u2	16	19	13	8	13	16	15	11	12	7
_u3	9	18	19	17	10	9	11	14	20	16

Task	_n1	_n2	_n3	_n4	_n5	_n6	_n7	_n8	_n9	_n10
ran_ku	108	77	80	80	69	63.33	42.67	35.67	44.33	14.67

_Egiven(G)	SL(G)
_Egiven(G)	MSLECC	WALECC	EECC	ISAECC	Our method
2086	10,989	8023	7955	7814	6846
2714	5915	4491	4564	4417	3887
3342	4644	3302	3487	3390	3108
4048	4052	2918	2937	2966	2779
5226	3381	2546	2609	2484	2388
6168	2947	2253	2315	2273	2190

ρ	SL(G)
ρ	MSLECC	WALECC	EECC	ISAECC	Our Method
8	805	654	679	666	634
16	1619	1331	1358	1294	1227
32	3664	2960	2777	2814	2641
64	8433	6340	6314	6326	5922
128	18,954	13,892	13,975	14,131	13,168
256	43,524	31,982	32,814	32,354	29,987

_Egiven(G)	SL(G)
_Egiven(G)	MSLECC	WALECC	EECC	ISAECC	Our Method
2436	6826	6144	5912	5973	5692
2763	6155	5389	5603	5398	5128
3665	5519	4457	4521	4324	4119
6535	4721	3776	3502	3418	3090
7191	3864	3093	3214	3186	3043
8749	3397	3012	3129	3017	2898

ρ	SL(G)
ρ	MSLECC	WALECC	EECC	ISAECC	Our Method
8	1056	897	911	893	847
13	2185	1794	1805	1737	1599
21	3698	3269	3082	3214	3106
41	18,845	14,504	14,729	14,563	13,534
83	39,584	31,375	31,551	31,422	30,032
100	59,951	46,210	48,238	47,916	42,657

Author Contributions

Conceptualization, M.J. and X.J.; methodology, K.H., M.J. and X.J.; software, M.J.; validation, K.H., M.J. and S.C.; formal analysis, M.J. and X.J.; investigation, M.J., S.C., X.L. and W.T.; resources, K.H., X.J. and Z.L.; data curation, M.J., X.L. and W.T.; writing-original draft preparation, M.J.; writing, review and editing, K.H., M.J., X.J. and D.X.; visualization, K.H.; supervision, K.H. and X.J.; project administration and funding acquisition, K.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Key R&D Program of China (2020YFB0906000, 2020YFB0906001).

Conflicts of Interest

The authors declare no conflict of interest.

Word count: 7813

Show less

© 2020. This work is licensed under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Minimizing the schedule length of parallel applications, which run on a heterogeneous multi-core system and are subject to energy consumption constraints, has recently attracted much attention. The key point of this problem is the strategy to pre-allocate the energy consumption of unscheduled tasks. Previous articles used the minimum value, average value or a power consumption weight value as the pre-allocation energy consumption of tasks. However, they all ignored the different levels of tasks. The tasks in different task levels have different impact on the overall schedule length when they are allocated the same energy consumption. Considering the task levels, we designed a novel task energy consumption pre-allocation strategy that is conducive to minimizing the scheduling time and developed a novel task schedule algorithm based on it. After getting the preliminary scheduling results, we also proposed a task execution frequency re-adjustment mechanism that can re-adjust the execution frequency of tasks, to further reduce the overall schedule length. We carried out a considerable number of experiments with practical parallel application models. The results of the experiments show that our method can reach better performance compared with the existing algorithms.

Details

Title

Task-Level Aware Scheduling of Energy-Constrained Applications on Heterogeneous Multi-Core System

First page

2077

Publication year

2020

Publication date

2020

Publisher

MDPI AG

e-ISSN

20799292

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.3390/electronics9122077

ProQuest document ID

2468802671

Task-Level Aware Scheduling of Energy-Constrained Applications on Heterogeneous Multi-Core System

Jump to:

Full text

Abstract

Details

Suggested sources