Content area
A new dynamic bin packing problem relevant to cloud computing is considered. The creation time, deletion time, and required resources are known for each item (virtual machine). The containers (servers) have a NUMA architecture and specific rules for placing the machines. The servers are grouped into racks, and some machines form groups. Each group is divided into partitions. Machines from different partitions cannot be placed on the same rack to ensure system fault tolerance. The objective is to pack all the machines into the minimum number of racks over a given planning horizon. A two-stage algorithm is developed to solve the problem: an initial solution is constructed, where some constraints may be violated, followed by iterative improvement using local search aimed at eliminating the violations. Using the proposed approach, an average deviation of 3.8% from the lower bound was achieved on open test cases.
INTRODUCTION
The development of cloud computing in recent decades has given rise to many pressing issues in the field of information technology. One of the key areas is the optimization of server resource use in order to improve energy efficiency, which includes solving the problem of dynamic bin packing. In this packing problem, a certain planning horizon is fixed during which it is necessary to optimally distribute virtual machines (VMs) across servers. All servers are identical and have NUMA-architecture [1, 2], i.e., they consist of several NUMA-nodes, each of which is characterized by a certain amount of and the number of processor cores ( ). Each virtual machine requires server resources, the amount of which is determined by its type. It is important to clarify that the number of VM types is usually considerably less than the total number of virtual machines. The VM type also determines its size—small or large. A small VM occupies the resources of only one NUMA-node, while a large VM is evenly distributed between two different NUMA-nodes. The time interval for using the resources of each VM is specified in the input data of the task. In addition, the task has racks and virtual machine placement groups with partitions [3], which is the main difference from the tasks considered in [4, 5]. A rack is a limited set of servers, and each server belongs to only one rack. Some virtual machines are combined into groups divided into nonoverlapping partitions (subsets of VMs) that are in conflict with each other. Virtual machines from different partitions of one group cannot be located on one rack at the same time, and a conflict between two such VMs occurs only when their existence intervals intersect.
The described problem is NP-hard in the strong sense, since it is a generalization of the bin packing problem. However, if we consider the problem of conflict resolution in groups of VMs with partitions in isolation, then the problem of coloring the conflict graph becomes polynomially solvable in this case. This is due to the possibility of coloring each partition in a separate color, which leads to an optimal solution. This feature significantly distinguishes the problem under consideration from the scenario where any two VMs can conflict, as well as from the problem with conflicts between VM types, studied in [5]. Thus, the presence of partitions introduces specific structural features that can be taken into account when developing algorithmic approaches to solving this problem. A similar problem is considered in the article [6]. In contrast to the formulation presented below, which considers conflicts, in [6] the emphasis is on constraints concerning the colocation of virtual machines from the same group. The authors present an MIP model and formulate several theorems concerning the complexity of this problem. It turns out that even under strong assumptions, such as the existence of only one point in time or one group, the problem remains NP-hard.
The resource optimization problem remains relevant, and its new variations continue to emerge. For example, in [7] a wide range of scenarios and problems is presented that arise due to the need to develop optimal schedules for virtual machines taking into account the technical features of modern computing clusters. The authors of [8] pay attention to the importance of ensuring fault tolerance of servers in data centers in order to reduce performance losses and ensure continuity of system operation in the event of failures. In parallel with the development of problem statements, the evolution of methods for their solution occurs. An analysis of classical algorithms such as First Fit, Best Fit, and Next Fit is carried out in [9, 10]. In addition, the authors propose new Hybrid First Fit and online Move-to-front algorithms, built on the basis of the above-mentioned classical approaches. The paper [11] is devoted to the study of a multicriteria formulation of the resource optimization problem. The authors propose a metaheuristic approach that integrates greedy heuristics into an adaptive evolutionary scheme. The proposed method demonstrates high efficiency when tested on various benchmarks, showing results comparable to other modern metaheuristic algorithms.
Another important area of research is the problem of optimizing energy rather than computing resources. For example, in [12] the authors analyze in detail the problems of efficient use of energy resources in data centers and propose new dynamic packing algorithms: Dynamic Energy Efficient Best-Fit Decreasing and Energy Mitigation Switcher. The first of these is aimed at optimizing the utilization of cloud system resources, and the second acts as a switch between different approaches. The conducted modeling shows that these algorithms can substantially reduce overall energy consumption and reduce task execution time. In addition, in [12] the use of virtualization technologies is proposed as an effective tool for optimizing resource allocation. In [13] the problem of reducing energy consumption is also studied, but the main focus is on the possibility of dynamic redistribution of virtual machines, which can lead to considerable savings. Similarly, in [14] the authors consider a variation of this problem with the possibility of repacking virtual machines and propose an algorithm whose distinctive feature is that the number of repackings is independent of the total number of virtual machines. Potentially, this approach can improve scalability and resource management efficiency in large data centers.
In addition, in recent years, approaches based on machine learning methods have become popular, including in the development of software libraries (see, for example, [15, 16]). The papers [17, 18] study approaches based on the reinforcement learning method. The authors conduct a comparative analysis of the effectiveness of machine learning algorithms and traditional methods, demonstrating the increasing competitiveness of the former, which emphasizes their potential in solving practical optimization problems. In [19], an overview of the limitations of traditional machine learning methods in solving the bin packing problem is presented. The authors show how their approach can overcome these limitations by using a large training data set and precise labeling. In [20], a method for solving the knapsack problem based on the use of neural networks is proposed. A detailed description of the network architecture is given, which takes all available parameters as input and outputs an optimal set of items. The results demonstrate the potential of this approach in the context of solving real-world problems. In [21], the issue of scheduling in the field of distributed computing is raised. The authors present a prototype of an online scheduler based on a hybrid approach that includes machine learning and dynamic programming methods.
A two-stage approach based on iterative solution correction is proposed to solve the problem under consideration. At the first step, some initial solution is constructed using a simple algorithm and then broken in such a way that some of the constraints associated with conflicts are violated. At the second step, all conflicts that were violated are determined for each group, and the procedure for correcting them is launched. Each virtual machine with an invalid placement is swapped with some other virtual machine in such a way as to reduce the total number of violated constraints in the current group under consideration. If it is not possible to get rid of all conflicts in a group, then some of this group is replaced. The main scientific contribution of this work is the study of a new NP-hard problem. A mathematical model has been developed that takes into account various aspects and allows for a formal description of this problem. In addition, algorithms for obtaining upper bounds are presented. The computational experiments conducted on open data demonstrated the high efficiency of the proposed methods, which confirms their practical applicability for solving the problem under consideration. A similar problem is considered in [22], where the authors proposed a stochastic greedy algorithm. A comparison with this approach is made in the section with computational experiments.
Further, Sec. 1 gives the necessary notation, detailed problem statement, and mathematical model. Section 2 contains algorithms for constructing upper bounds. Section 3 describes the data used, as well as the results of numerical experiments. The conclusions section briefly summarizes the results of the study.
1. MATHEMATICAL MODEL
To formulate the problem and construct its mathematical model, we introduce the following notation:
The set of points in time or the planning horizon.
The set of resource types.
The set of racks.
The set of servers velonging ot rack .
The set of all servers.
The set of NUMA-nodes.
Racks are collections of servers, and the capacity of each rack is limited and, as a rule, does not exceed 10–20 units. Each server must be associated with exactly one rack, in other words, belong to it. Although the number of servers in racks can vary, the main goal of the problem is to minimize the total number of racks used. Therefore, within the framework of this problem statement, we can always use the maximum filling of each rack with servers. All servers are identical, but it is important to note that different NUMA-nodes of the same server may differ in the number of available resources.
To work with groups and virtual machines, we introduce the following notation:
The set of small virtual machines.
The set of large virtual machines.
The set of all virtual machines.
The set of groups of virtual machines.
The set of virtual machines from group .
The set OF PARTITIONS IN group .
The set of VMs from partition of group .
The set of types of virtual machines.
[See PDF for image]
Fig. 1.
Example of packing a group with three partitions.
A group is a set of virtual machines: for example, these can be virtual machines of a single client. Each group is further divided into subsets, called partitions. The number of partitions in a group is usually small and does not exceed 10. The key limitation is that any two virtual machines from different partitions of the same group cannot be placed on the same rack (Fig. 1). Thus, partitions can be considered as a special case of conflicts between VMs. However, the structure of the graph of such conflicts (where vertices are virtual machines, and edges are conflicts between them) is quite specific. The graph is divided into several connectivity components corresponding to groups. Each connectivity component can be effectively colored in a minimum number of colors by simply assigning a separate color to each partition. The type defines the amount of and the number of cores that the VM occupies on the server. The sets of VM types at most cloud computing companies have substantial overlap. An example of such a set of types can be found in [23]. It should also be noted that the power of the set of VM types is usually much smaller than the power of the set of virtual machines.
The following notation is introduced to describe the characteristics of virtual machines and servers:
The amount of resource on NUMA-node .
The amount of resource necessary for a virtual machine for NUMA-node.
The amount of resource necessary for type for one NUMA-node.
The time of creating machine .
The time of deleting machine .
Each virtual machine ( ) takes ( ) of resource on the server at all times . For any resource , the equality holds if the virtual machine is assigned the type . The input data guarantees that for any , , and . It is important to clarify that if there is a conflict between two VMs and but they do not intersect in time, then this conflict can be ignored.
Taking into account all of the above, the set of conflicts can be defined as follows. A pair of VMs denoting a conflict belongs to if and , where and are different partitions ( ) of some group , and there exists a time instant such that and .
Let us also define the following sets of VMs operating at a given moment of time :
The set of small VMs.
The set of large VMs.
The set of all VMs.
To describe the mathematical model, we need the following Boolean variables:
is equal to 1 if a small virtual machine (or the first part of a large VM) is located on node of server , .
is equal to 1 if the second part of a large VM is located on node of server , .
is equal to 1 if rack was active at least at one point in time .
Now the mathematical model of the problem under consideration can be written as follows:
1
2
3
4
5
6
7
8
The objective function (1) expresses the minimization of the number of involved racks. The constraints (2) ensure that all virtual machines are packed, and the constraints (3)–(4) are responsible for the correct packing of large VMs. Inequalities (5) limit the amount of resources on each server. The constraints (6) are needed to calculate the number of racks. We consider a rack to be active if at least one virtual machine was served on it. The set of inequalities (7) defines conflicts between virtual machines.
The need to solve the problem for fairly large examples (more than virtual machines) does not allow using the mathematical model (1)–(8), since it requires considerable computing resources and time. As an alternative, it is proposed to use heuristic approaches that have proven themselves well [5, 24]. The description of the algorithm under consideration is given in Sec. 2.
2. UPPER BOUNDS
When solving a problem in an online setting, there is a need to pack virtual machines in the order they were created, taking into account conflicts directly at the placement stage. However, a dynamic setting has greater freedom of action; this allows for the development of more flexible and efficient heuristic methods. The proposed approach is based on a two-stage optimization process. At the first stage, an initial solution to the problem is formed that ignores some of the constraints associated with conflicts. At the second stage, a correction procedure is applied, aimed at eliminating constraint violations. Dividing the process into the formation of an initial solution and subsequent correction allowed for greater flexibility and quality of the final placement.
At the first stage, a randomized version of the First Fit algorithm is used to place virtual machine groups (RFFG). The algorithm sequentially packs VM groups in a random order one after another and does not start packing the next group until it finishes with the current one. Each group is packed in such a way that there are no conflicts between partitions of the same group. If there are VMs in the example that do not belong to any group, they are packed last. Randomizing the order of groups allows achieving significantly better results while maintaining the performance of the algorithm. Using this approach, it is possible to generate several potential solutions at once, from which the best one is subsequently selected. It is important to note that the order of VMs within each group is also randomized; this further increases the variability of solutions. Next, to create the possibility of further improvement, a certain percentage of racks are unpacked, after which unallocated VMs are repacked using a randomized First Fit (RFF) algorithm, ignoring conflict restrictions.
We define the similarity of virtual machines using the Jaccard coefficients [26], in particular, its adaptation for IoU (Intersection over Union) rectangles. In our case, the width of the rectangle (virtual machine ) is , and the height is determined by the relative value of the required resource ( in the case of a large VM). Thus, the coefficient depends not only on the selected virtual machines and , but also on the resource under consideration. In order to get rid of this ambiguity, we will consider the adaptation of the coefficient
In the second step, the existing solution is iteratively rebuilt to successively eliminate constraint violations. Groups of virtual machines are scanned one by one in random order. For each VM , the possibility of permutation with other VMs is analyzed, excluding from consideration those that satisfy certain rules: they are on the same rack as , they belong to already scanned groups, they are part of the same group and partition as , or they have a low degree of similarity with ( ). These screening criteria allow one to focus on the most promising permutations that can reduce the number of constraint violations without the risk of introducing new conflicts. Swapping two VMs does not increase the objective function, but this approach cannot always reduce the number of constraint violations to zero. This leads to the need for an additional step. If at some iteration (a full scan of all VMs in a group) it was not possible to completely get rid of constraint violations, then part of this group is completely repacked. In this case, only the partition from the group that has the largest number of VMs remains on each rack, and all other VMs are deleted and repacked using RFF. However, this time the algorithm packs in such a way as not to violate any constraints, i.e., for each VM a certain set of forbidden racks is determined, where packing is not performed. If necessary, new racks and servers are created, so that the algorithm always returns a feasible solution. The algorithm diagram can be represented as follows.
[See PDF for image]
Fig. 2.
Dependence of the objective function and the number of violated constraints on the iteration number.
RALR algorithm.
Using the RFFG algorithm, an initial feasible solution is constructed that does not contain violations of the constraints (2)–(8).
A random set of racks are unpacked, and the freed VMs are repacked using RFF, possibly violating partition conflict restrictions.
For the next group and for each VM in this group, a local search is launched in the space of permutations with other VMs in such a way as to reduce the number of violated constraints. Each group is viewed completely several times until violations can be eliminated. If all violations can be eliminated, then we move on to the next group, otherwise we go to step 4.
If it is not possible to fix all violations for the group in question, then it is partially repacked using the RFF algorithm, which corrects all restrictions, creating new servers and racks if necessary.
An example of the behavior of the objective function and the number of violated constraints during steps 3–4 can be found in Fig. 2. Here, both curves are normalized to the interval . The curve of the deviation of the objective function shows how its value changed relative to the solution obtained at the second step. The sharp jumps on both curves correspond to repacking of groups, which, as can be seen, does not always lead to a deterioration in the solution.
It should be noted that steps 2–4 can sometimes worsen the initial solution. In the case where such worsening is actually observed, the solution obtained by the RFFG method in the first step is returned as the final one. However, the results of numerical experiments show that such a situation occurs quite rarely.
The complexity of steps 1, 2, and 4 is of the order of ; this is comparable to the complexity of the RFF algorithm, which takes into account the dynamics and NUMA architecture. The parameter limits the number of servers from above: for example, one can use . Step 3 is the most labor-intensive in the RALR algorithm. For each pair of virtual machines, it is necessary to check the admissibility of exchanging their places, which requires operations. The number of full scans of one group is limited by the number of possible conflicts, which is . In practice, the number of iterations is limited from above; in the computational experiments conducted, the bound is 20, although the algorithm usually requires no more than 10 iterations to reach a local minimum. Despite its complexity, thanks to various optimizations and restrictions on the size of the local search neighborhoods, step 3 is performed quickly enough to obtain a feasible solution in a reasonable time. For example, the search for a solution is noticeably accelerated by the value of the similarity threshold of two virtual machines . Many pairs of virtual machines have a fairly low coefficient , for example, due to a large spread in time. An attempt to swap them will most likely end in failure. Even a small value of allows one to effectively exclude from consideration most of the unsuccessful pairs and considerably reduce the size of the neighborhood.
3. NUMERICAL EXPERIMENTYS
A random generator provided by a cloud computing company was used to conduct the numerical experiments. The generator simulates the behavior of a real load on servers. At each step, it creates requests for placement of virtual machines sequentially, trying to keep the cluster load (a limited set of racks) within a certain interval. New placement groups and types of virtual machines are generated in accordance with frequencies obtained as a result of analyzing the operation history of one of the clusters of the same company. Examples of the total load by number of racks can be seen in Fig. 3. The abscissa axis shows time, and the ordinate axis shows the minimum required number of racks, calculated using the formula All instances considered are in the public domain [27].
[See PDF for image]
Fig. 3.
Lower bound by time points for various examples.
Analysis of instances and various statistics are given in Table 1. All instances are divided into several families. The LPO, DMP, and MP notation in the family name indicates the type of instances, where LPO are instances with only large partition groups, MP are instances with different (large and small) partition groups, and DMP is a generalization of MP instances with the addition of virtual machines that do not belong to any group, i.e., do not have conflicts with any other VMs. The letter notations “l”, “m”, and “s” correspond to the size of the instance, i.e., the number of virtual machines and the number of time points, and the notation “2” and “4” shows the number of NUMA nodes on the servers. The maximum rack size in all instances is 10 servers. On average, all instances have 5 partitions per group. Each family consists of 25 instances, and in total 450 instances were considered. Table 1 shows the average values over all instances of each family.
Table 1.. Analysis of instances
Instance family | Average number of VMs | Average number of time points | Average number of groups with partitions | Average number of VMs in a group | Average number of VMs w/o conflicts |
|---|---|---|---|---|---|
LPO_l_2 | 64 703.12 | 338.28 | 605.92 | 106.85 | 0.00 |
LPO_l_4 | 68 349.76 | 280.40 | 671.84 | 101.80 | 0.00 |
LPO_m_2 | 28 825.24 | 153.72 | 272.20 | 105.89 | 0.00 |
LPO_m_4 | 27 328.32 | 115.04 | 271.76 | 100.59 | 0.00 |
LPO_s_2 | 13 090.92 | 70.28 | 124.56 | 105.20 | 0.00 |
LPO_s_4 | 12 781.92 | 55.24 | 124.92 | 102.20 | 0.00 |
DMP_l_2 | 38 074.36 | 379.68 | 354.36 | 69.60 | 13 452.12 |
DMP_l_4 | 42 577.24 | 311.80 | 425.72 | 63.27 | 15 668.76 |
DMP_m_2 | 32 460.80 | 343.88 | 273.60 | 69.51 | 13 452.12 |
DMP_m_4 | 32 767.40 | 276.00 | 271.36 | 62.99 | 15 668.76 |
DMP_s_2 | 22 066.76 | 274.48 | 124.88 | 68.95 | 13 452.12 |
DMP_s_4 | 23 220.84 | 235.72 | 121.88 | 62.06 | 15 668.76 |
MP_l_2 | 44 820.88 | 308.88 | 641.40 | 69.96 | 0.00 |
MP_l_4 | 49 286.20 | 217.60 | 737.04 | 66.91 | 0.00 |
MP_m_2 | 19 213.04 | 136.16 | 273.84 | 70.08 | 0.00 |
MP_m_4 | 18 679.64 | 84.28 | 278.64 | 66.98 | 0.00 |
MP_s_2 | 8838.44 | 63.32 | 126.16 | 69.98 | 0.00 |
MP_s_4 | 8213.20 | 37.24 | 121.92 | 67.46 | 0.00 |
The following algorithms were selected for comparison:
RALR The algorithm described in Sec. 2 with parameters and .
INIT + NMSLI The stochastic algorithm given in [22]; this approach is based on the First Fit algorithm and the bisection method.
RFFG The first step in the RALR algorithm; the method is a variation of the First Fit algorithm in which groups are packed one after another in random order in such a way that the conflict constraints are not violated, and virtual machines outside groups are also packed in random order at the end of the algorithm.
RAR A simplified version of the RALR algorithm that lacks the most labor-intensive step 3.
RFF A randomized First Fit algorithm that packs virtual machines in a random order, does not take into account conflicts, and is needed to estimate the additional complexity of instances caused by the presence of groups.
Table 2.. Comparison of algorithms
Instance family | RALR | INIT + MSLI | RFFG | RAR | ||||
|---|---|---|---|---|---|---|---|---|
avg. | std. | avg. | std. | avg. | std. | avg. | std. | |
LPO_l_2 | 5.14 | 0.87 | 4.36 | 0.89 | 12.39 | 0.68 | 13.64 | 1.09 |
LPO_l_4 | 6.12 | 1.33 | 7.57 | 0.56 | 9.93 | 0.49 | 13.49 | 0.80 |
LPO_m_2 | 1.49 | 2.21 | 1.55 | 1.51 | 6.85 | 1.91 | 9.98 | 1.85 |
LPO_m_4 | 4.52 | 2.37 | 8.25 | 1.89 | 8.04 | 1.36 | 12.68 | 1.72 |
LPO_s_2 | 3.27 | 3.09 | 3.90 | 2.85 | 6.24 | 3.22 | 9.63 | 2.69 |
LPO_s_4 | 6.27 | 2.72 | 11.95 | 3.34 | 9.32 | 2.75 | 14.52 | 2.73 |
DMP_l_2 | 3.02 | 1.10 | 2.11 | 0.87 | 7.34 | 1.48 | 8.26 | 1.41 |
DMP_l_4 | 3.06 | 0.46 | 3.31 | 0.61 | 4.36 | 1.00 | 4.56 | 1.94 |
DMP_m_2 | 1.94 | 1.21 | 1.42 | 1.03 | 6.42 | 1.82 | 7.33 | 1.93 |
DMP_m_4 | 2.85 | 0.87 | 3.01 | 0.91 | 4.35 | 1.11 | 4.40 | 3.33 |
DMP_s_2 | 1.43 | 1.13 | 1.19 | 1.16 | 5.21 | 2.08 | 6.03 | 1.93 |
DMP_s_4 | 3.10 | 1.19 | 3.11 | 1.24 | 4.40 | 1.54 | 4.95 | 1.34 |
MP_l_2 | 3.96 | 0.78 | 3.13 | 0.63 | 9.79 | 0.73 | 11.16 | 0.80 |
MP_l_4 | 5.81 | 1.00 | 6.58 | 0.64 | 7.64 | 0.57 | 10.64 | 0.70 |
MP_m_2 | 2.20 | 1.21 | 2.02 | 1.49 | 7.00 | 2.22 | 9.93 | 2.67 |
MP_m_4 | 5.45 | 1.74 | 7.91 | 1.26 | 7.10 | 1.58 | 12.22 | 3.93 |
MP_s_2 | 3.05 | 2.48 | 3.85 | 3.60 | 5.94 | 3.79 | 11.05 | 3.88 |
MP_s_4 | 6.41 | 3.65 | 11.26 | 3.87 | 7.58 | 2.69 | 12.23 | 2.64 |
Average | 3.84 | 1.63 | 4.80 | 1.57 | 7.22 | 1.72 | 9.81 | 2.08 |
All algorithms are implemented in the Python programming language; calculations
were performed on a computer with an AMD EPYC 7502 32-core processor. In each example, the
relative deviation was calculated using the formula
, where
The analysis of the results shows that the RALR algorithm demonstrates
considerably better results compared to RFFG and RAR on all presented data sets, even though
RFFG and RAR are more efficient and managed to process more runs in the allotted time. The
observed difference in deviations emphasizes the importance of having all the stages implemented
in RALR. Nevertheless, the RFF algorithm was able to demonstrate an average deviation of
, which is noticeably lower than that of the other approaches, despite its
simplicity. It should be noted that RFF does not take into account the presence of conflicts, and
the observed difference is probably due to the additional complexity of the instances associated
with the presence of groups and partitions. In addition, when analyzing the results of RFF, it was
noted that on instances with four NUMA-nodes, it has higher
Table 3..
Influence of parameter
on deviation
Instance family | Value of , % | ||||||
|---|---|---|---|---|---|---|---|
Value of | |||||||
DMP_l_2 | 6.62 | 5.30 | 4.16 | 3.46 | 3.02 | 4.19 | 4.62 |
DMP_l_4 | 4.29 | 3.65 | 3.62 | 3.32 | 3.06 | 3.69 | 3.73 |
DMP_m_2 | 4.34 | 3.47 | 2.82 | 2.39 | 1.94 | 2.52 | 2.80 |
DMP_m_4 | 4.33 | 3.47 | 3.29 | 3.12 | 2.85 | 3.59 | 3.68 |
DMP_s_2 | 3.71 | 2.97 | 2.41 | 2.23 | 1.43 | 1.80 | 1.89 |
DMP_s_4 | 4.25 | 3.82 | 3.57 | 3.40 | 3.10 | 3.42 | 3.48 |
Average | 4.59 | 3.78 | 3.31 | 3.02 | 2.57 | 3.20 | 3.37 |
[See PDF for image]
Fig. 4.
Dependence of
deviation
An analysis of the influence of the parameter on the efficiency of the RALR algorithm was also of interest during the study. The results of this analysis are presented in Table 3 and Fig. 4. For relatively small values of parameter , the RALR results are similar to the RFFG results. This means that with such a small repacking, RALR fails to effectively correct violations, and it often returns the solution obtained in the first step. When the value of parameter is increased to too large values, the quality of the RALR algorithm also remains low. In this case, too much of the solution is destroyed. Packing with RFF for a significant part of VMs, apparently, is not good enough to be effectively corrected. This results in too frequent use of the group repacking procedure from step 4, which further reduces the quality.
CONCLUSIONS
In this paper, we consider a dynamic bin packing problem taking into account placement groups that impose restrictions on the location of virtual machines. A mathematical model of the problem is presented, the operation of several heuristic methods is demonstrated, and their comparison is carried out on a large volume of open instances. In addition, an analysis of one of the algorithms is performed for a deeper understanding of its behavior.
The proposed method can be further generalized to other modifications of the problem, taking into account the development of cloud services and the emergence of new opportunities for clients. For example, additional constarints on placement groups can be investigated such as conflicts at the server level instead of racks or requirements for the proximity of placement of virtual machines within a group [6]. Also, a promising direction is to improve the lower bounds, since the current approach does not take into account the influence of placement groups.
This research is carried out within the framework of the state contract of the Sobolev Institute of Mathematics, project no. FWNF–2022–0019.
CONFLICT OF INTEREST
The authors of this work declare that they have no conflicts of interest.
Publisher’s Note. Pleiades Publishing remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. AI tools may have been used in the translation or editing of this article.
REFERENCES
1 Lameter, C. NUMA (Non-Uniform Memory Access): An Overview. Queue; 2013; 11, pp. 40-51. [DOI: https://dx.doi.org/10.1145/2508834.2513149]
2 P. Li, X. Wu, Y. Luo, L. Wang, N. Yadav, M. Pepin, and J. Morgan, “NUMAware: Accelerate VM-to-VM I/O Performance in NUMA Servers for NFV Applications,” Proc. IEEE NFV-SDN (2016).
3 Placement groups—amazon elastic compute cloud. Available at https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html.
4 A. Ratushnyi and Y. Kochetov, “A column generation based heuristic for a temporal bin packing problem,” Mathematical Optimization Theory and Operations Research: 20th Int. Conf. MOTOR 2021 (2021), pp. 96–110. https://doi.org/10.1007/978-3-030-77876-7_7.
5 A. Ratushnyi, “A pattern-based heuristic for a temporal bin packing problem with conflicts,” Mathematical Optimization Theory and Operations Research: 22nd Int. Conf. MOTOR 2023, pp. 161–175. https://doi.org/10.1007/978-3-031-43257-6_13.
6 P. Borisovsky, A. Eremeev, A. Panin, and M. Sakhno, “Temporal bin packing problems with placement constraints: MIP-Models and complexity,” Mathematical Optimization Theory and Operations Research: 23rd Int. Conf. MOTOR 2024, pp. 157–169. https://doi.org/10.1007/978-3-031-62792-7_11.
7 Grushin, D.A.; Kuzyurin, N.N. On effective scheduling in computing clusters. Program. Comput. Software; 2019; 45, pp. 398-404. [DOI: https://dx.doi.org/10.1134/S0361768819070077]
8 K. Daudjee, S. Kamali, and A. López-Ortiz, “On the online fault-tolerant server consolidation problem,” SPAA ’14: Proc. 26th ACM Symp. Parallelism Algorithms Archit. (2014), pp. 12–21. https://doi.org/10.1145/2612669.2612686
9 Li, Y.; Tang, X.; Cai, W. Dynamic bin packing for on-demand cloud resource allocation. IEEE Trans. Parallel Distrib. Syst.; 2016; 27,
10 A. Murhekar, D. Arbour, T. Mai, and A. Rao, “Dynamic vector bin packing for online resource allocation in the cloud,” SPAA ’23: Proc. 35th ACM Symp. Parallelism Algorithms Archit. (2023), pp. 307–310. https://doi.org/10.1145/3558481.3591314
11 Spencer, K.Y.; Tsvetkov, P.V.; Jarrell, J.J. A greedy memetic algorithm for a multiobjective dynamic bin packing problem for storing cooling objects. J. Heuristics; 2019; 25, pp. 1-45. [DOI: https://dx.doi.org/10.1007/s10732-018-9382-0]
12 Gupta, N.; Gupta, K.; Deepali, G.; Juneja, S.; Turabieh, H.; Dhiman, G.; Kautish, S.; Viriyasitavat, W. Enhanced virtualization-based dynamic bin-packing optimized energy management solution for heterogeneous clouds. Math. Probl. Eng.; 2022; 2022, pp. 1-11. [DOI: https://dx.doi.org/10.1155/2022/8734198]
13 A. Beloglazov and R. Buyya, “Energy efficient allocation of virtual machines in cloud data centers,” 10th IEEE/ACM Int. Conf. Cluster Cloud Grid Comput. (2010), pp. 577–578. https://doi.org/10.1109/CCGRID.2010.45
14 Berndt, S.; Jansen, K.; Klein, K.M. Fully dynamic bin packing revisited. Math. Program.; 2020; 179, pp. 109-155.4050928 [DOI: https://dx.doi.org/10.1007/s10107-018-1325-x]
15 C. Hubbs, H. Perez, O. Sarwar, N. Sahinidis, I. Grossmann, and J. Wassick, “OR-Gym: A reinforcement learning library for operations research problem,” (2020).
16 C. Wan, T. Li, and J. Wang, “RLOR: A flexible framework of deep reinforcement learning for operation research,” (2023).
17 B. Balaji, J. Bell-Masterson, E. Bilgin, A. Damianou, P. Garcia, A. Jain, R. Luo, A. Maggiar, B. Narayanaswamy, and C. Ye, “ORL: Reinforcement learning benchmarks for online stochastic optimization problems,” (2019).
18 Yang, T.; Luo, F.; Fuentes, J.; Ding, W.; Gu, C. A flexible reinforced bin packing framework with automatic slack selection. Math. Probl. Eng.; 2021; 2021, pp. 1-15. [DOI: https://dx.doi.org/10.1155/2021/6653586]
19 F. Mao, E. Blanco, M. Fu, R. Jain, A. Gupta, S. Mancel, R. Yuan, S. Guo, S. Kumar, and Y. Tian, “Small boxes big data: A deep learning approach to optimize variable sized bin packing,” IEEE Third Int. Conf. Big Data Comput. Serv. Appl. (2017), pp. 80–89. https://doi.org/10.1109/BigDataService.2017.18
20 Nomer, H.A.A.; Alnowibet, K.A.; Elsayed, A.; Mohamed, A.W. Neural knapsack: A neural network based solver for the knapsack problem. IEEE Access; 2020; 8, pp. 224200-224210. [DOI: https://dx.doi.org/10.1109/ACCESS.2020.3044005]
21 V. Toporkov, D. Yemelyanov, and A. Bulkhak, “Machine learning-based online scheduling in distributed computing,” Parallel Process. Appl. Math. , 248–259 (2023). https://doi.org/10.1007/978-3-031-30445-3_21
22 A. Turnaev and A. Panin, “Stochastic greedy algorithms for a temporal bin packing problem with placement groups,” Mathematical Optimization Theory and Operations Research: 23rd Int. Conf. MOTOR (2024), pp. 199–211. https://doi.org/10.1007/978-3-031-62792-7_14
23 Amazon EC2 Instance types. Available at https://aws.amazon.com/ec2/instance-types/.
24 Munien, C.; Ezugwu, A. Metaheuristic algorithms for one-dimensional bin-packing problems: A survey of recent advances and applications. J. Intell. Syst.; 2021; 30, pp. 636-663. [DOI: https://dx.doi.org/10.1515/jisys-2020-0117]
25 Johnson, D.S. Fast algorithms for bin packing. J. Comput. Syst. Sci.; 1974; 8, pp. 272-314.373370 [DOI: https://dx.doi.org/10.1016/S0022-0000(74)80026-7]
26 Kosub, S. A note on the triangle inequality for the Jaccard distance. Pattern Recognit. Lett.; 2019; 120, pp. 36-38. [DOI: https://dx.doi.org/10.1016/j.patrec.2018.12.007]
27 Temporal Bin Packing Problem with Placement Groups. Available at http://old.math.nsc.ru/AP/benchmarks/Temporal%20Bin%20Packing/binpack.html.
© Pleiades Publishing, Ltd. 2025.