1. Introduction
Virtualization [1,2], as an effective technology for resource sharing, enables resource multiplexing of the underlying physical machines. The most common examples are hypervisor-based virtualization [3] and container-based virtualization [4]. In hypervisor-based virtualization, a layer called
There is no doubt that Docker [6] is currently the most popular open-source application container engine. Compared with VM, Docker comes with many advantages, such as better system resource management, better administrative operations, and other. With the the popularity of container-based virtualization (i.e., containerization), the use of containers grows exponentially. Therefore, developers need some management systems to manage them freely but not manipulate them one by one. As this demand continues to grow, then comes the container orchestration systems. Especially, it is difficult and tedious for developers or maintainers to manage them one by one manually. That is why we need container orchestration system to automatically manage them. Notably, various orchestration frameworks are accessible from the community, such as Kubernetes (K8s) [7], Mesos [8], Docker Swarm [9], Nomad [10], SaltStack [11], Amazon Elastic Container Service (Amazon ECS) [12], OpenShift [13,14], and many other. Among them Kubernetes is the most popular and most commonly used [15,16]. Importantly, Kubernetes is the most popular container orchestration framework [7,15,17,18], so in this paper we narrow down our study to Kubernetes. (A comparative analysis is presented in the Section 3.1.) Kubernetes (Kubernetes Official Documentation
With the development of cloud computing technology, the container management architecture led by Kubernetes [19,20] has been adopted and promoted by more and more enterprises. One of its functions called
To find a better scaling scheme on K8s that implements proactive approaches based on traffic prediction, we summarize the prominently used time series prediction models, i.e., load prediction methodologies (in Section 3.2) and the latest K8s autoscaling schemes (in Section 3.3) with obvious characteristics and excellent results. Among them, the time series prediction model is mainly based on the deep learning model,
Next, we dive in analyzing the load prediction models, including the traditional forecasting models (ARIMA) and deep learning models (LSTM, BiLSTM, GRU) to derive the best one. In the processing, we use Google-cluster-data-2011-2 [30] as a dataset. To this, we first analyze the dataset to confirm the prediction target, then train and evaluate the aforementioned models to obtain the best one and apply the model to our customized scaling scheme. After obtaining the best load prediction model, we encapsulate the model with our proposed custom pod autoscaling scheme and build a Docker image of it so that we can deploy it in a Kubernetes cluster as a component of its own. Thereby, we deploy our proposed custom pod autoscaler in our deployed Kubernetes cluster and evaluate its performance comparing with the native autoscaler HPA.
In essence, we delve into the following hypotheses in this article:
Proposing an autoscaling scheme that combines proactive and reactive method based on the latest research outcomes. In the paper, we demonstrate in detail how the scheme starts from load prediction model selection for autoscaler and the deployment of the proposed autoscaler and experimental analysis.
Exploring K8s and the third-party custom autoscaling framework, Custom Pod Autoscaler (CPA) framework [31], integrating the CPA framework and our proposed proactive autoscaling scheme together to build our custom pod autoscaler, and deploying it to the K8s cluster for experimental analysis toward validating the effectiveness of it.
The rest of the paper is organized as follows. Section 2 presents the architecture, features, and the components of Kubernetes. Section 3 demonstrate the related work in the context of our study, such as commonly used container orchestration frameworks, load prediction methodologies and their respective principles related to our analysis, the latest custom autoscaler with obvious characteristics and excellent results, and analyzed their effectiveness and shortcomings. Section 4 is about the empirical analysis of load prediction model selection for our proposed autoscaler. Section 5 is the about the development, deployment, and evaluation of the proposed autoscaler. Conclusions and future study directions are discussed in the last section.
2. Architecture and Principles of Kubernetes
In this section, we present the details of Kubernetes including its architecture, features, and components.
2.1. Kubernetes Architecture
We observe that a Kubernetes cluster is composed of multiple nodes, divided into two groups—
2.2. Core Components
The following components are the most important and elementary components for maintaining the operation of K8s, they are Pod, ReplicaSet, Deployment, and Services, which are mainly responsible for the execution of containers, managing applications, and communication.
2.2.1. Pod
The Pod is the smallest object that developers can configure in the K8s cluster, which can run more than one container. Each container within the Pod shares the same network and K8s assigns different IP addresses for each Pod to avoid port conflicts. Generally, Pod is scheduled or managed by ReplicaSet or Deployment. Notably, we can also create a standalone Pod for testing an application. For example, we can run/deploy a Pod with an
Listing 1. Creating a standalone Pod (running an |
|
2.2.2. ReplicaSet
ReplicaSet (RS) is a sub-component of Deployment, which provide functions such as label and selector. The label is used for marking specified Pods, and selectors help RS identify and monitor specified Pods in a myriad of Pods. Typically, ReplicaSet helps assist Deployment in managing and maintaining Pods as shown in Figure 2.
2.2.3. Deployment
Compared with ReplicaSet, Deployment comes with more features and functions, so in the real environment, developers choose Deployment to manage ReplicaSet and Pods. One of the main functions of Deployment is a rolling update, while applications in Pods need to be updated, every update can be recorded in the system, so if somehow the new version is unstable, developers can roll back to any specified version in the record. Moreover, managing ReplicaSet and Pods, and performing rolling updates, Deployments are prominently used for scaling applications. First, we see how we can run/deploy a single instance (single Pod) of an application, e.g., an
Listing 2. Creating a Deployment (running a single instance of an |
|
The difference of running an application with Pod and Deployment is that we can easily scale an application with the help of Deployment but not with Pod. For example (refer to the Listing 3).
Listing 3. Scaling an |
|
Now, it can run/deploy four instances of an
Listing 4. Rolling update and rolling back of an |
2.2.4. Services
The main function of Services (SVC) in K8s is communication. Some SVCs are responsible for connecting the internal components of the cluster, and some are responsible for the communication between the cluster and the clients. There are three types of services, such as NodePort, ClusterIP, and LoadBalancer which are as self-explanatory as their titles.
2.3. Prominent Components with Autoscaling
In this subsection, we introduce a set of three prominent components related to autoscaling activities.
2.3.1. Horizontal Pod Autoscaler (HPA)
HPA [33] is a relatively common and well-functioning reactive autoscaling strategy in K8s. HPA can manage the Deployment component to control the number of Pod counts by using the CPU utilization as a threshold, so as to achieve the purpose of autoscaling, which is shown in Figure 3. For example (refer to the Listing 5).
Listing 5. Autoscaling of an |
|
In this instance, it creates an HPA for the Deployment
(1)
where
Listing 6. Getting/Fetching resource usage data across the Pods. |
|
Now, we see how the
Listing 7. Tracking of resource utilization and new replica/Pod deployment with the increase/decrease in resource utilization. |
|
|
Notably, to passively triggering scaling after reaching the threshold, HPA also periodically queries resource utilization to adjust the number of copies in the Deployment and RS. Last but not least, HPA supports richer scaling strategies, which can be specified in the
2.3.2. Vertical Pod Autoscaler (VPA)
The aforementioned HPA is scaled by managing the number of Pods, while the Vertical Pod Autoscaler (VPA) [35] is scaled by reasonably allocating the CPU and memory of each Pod. Its biggest advantage is to request resources on demand or schedule the Pod to the appropriate node, which greatly improves the service efficiency of the cluster. However, compared with HPA, K8s open-source VPA is not mature enough and is in the experimental stage.
2.3.3. Cluster Autoscaler (CA)
Unlike VPA or HPA, which focus on Pod scaling, Cluster Autoscaler (CA) [36] is a component that scales the whole K8s cluster, which can automatically adjust the nodes dynamically to ensure all of the Pods can be allocated enough resources and delete nodes with low resource utilization.
2.4. Prominent Add-Ons
In this section, we introduce the prominent add-ons required for our analysis.
2.4.1. Metrics Server
Metrics Server (Metrics Server
2.4.2. Prometheus
Prometheus (Prometheus
3. Related Work
In this section, we begin with presenting a set of container orchestration frameworks while highlighting the merits and limitations of Kubernetes comparing with them. We then present the prominently used load prediction methodologies that can be integrated in our custom pod autoscaler toward predicting the load in advance for optimal scaling in Kubernetes. Then, we perform a literature review of customized proactive autoscaling strategies that have been applied to Kubernetes.
3.1. Kubernetes vs. Other Container Orchestration Frameworks
We observe that different container orchestration frameworks are developed to meet the various market needs. For detailed discussion, we go with the two easily distinguishable categories of frameworks. The first category includes fully-managed, paid, closed-source, easy deployable and manageable frameworks, such as Amazon Elastic Container Service (Amazon ECS) [12,38], Amazon Elastic Container Service for Kubernetes (EKS) [16] Google Kubernetes Engine (GKE) [16], Microsoft Azure Kubernetes Service (AKS) [16], OpenShift [13,14,39], and others. The second category includes self-managed and open-source frameworks, such as Kubernetes [7,17], Mesos [8,40], Docker Swarm [9,10,41,42], Nomad [10], SaltStack [11,43,44], and many others. Although each framework has unique features that others could do not have, their limitations dissuade some potential users. Notably, fully-managed frameworks with added features come with high cost. Moreover, they are less-customizable, less flexible, and suffer from vendor lock-in issue. On the other hand, even though open-source frameworks have a steep learning curve and have a complex setup for beginners, they are far preferred by users as they have better community services, especially Kubernetes [7,15,17,18]. We observe that among the open-source frameworks, Kubernetes is the best choice. Even compared with fully-managed services, Kubernetes is the clear winner [15,16,18,45].
Specifically, we observe comparative analyses of container orchestration frameworks in the research works [15,16] and find that Kubernetes wins the race by a fair margin (for this, we do not repeat the same analysis herein, simply inferring their analysis). Specifically, in the studies [15,16], the comparative analysis is shown among Kubernetes, Mesos, Docker Swarm, Nomad, and the fully-managed frameworks except SaltStack, while comparing with SaltStack, we see that Kubernetes is superior to SaltStack [11,43,44]. In particular, we observe that Kubernetes caters better to business needs than SaltStack. Moreover, Kubernetes is a better choice than SaltStack with quality metrics, feature updates, and other evaluation criteria [11,43,44]. Notably, Kubernetes is easier to use, as stated earlier, while the saving grace of SaltStack is that it is easier to set up, manage, and control. We also would like to state that the lack of documentation and recent research works for SaltStack results in increased complexity of usage (notably, we have checked out out the official documentation of SaltStack,
We also observe that orchestration with Kubernetes helps optimize the Quality of Service (QoS), for example Carrión et al. [18] analyses the principles of Kubernetes scheduler in assigning physical resources to containers while optimizing QoS, such as response time, energy consumption, resource utilization, and other things. It also highlights the gaps in scheduling and concludes with future research directions to address the same.
All in all, Kubernetes has become the the de facto standard for simplifying the efficient deployment of containerized applications [7,15,17,18], so in this paper, our subject of study is Kubernetes.
3.2. Literature Review: Load Prediction Methodology
We find that the prediction results of traditional statistical analysis model [47,48] (such as ARIMA) are no longer comparable to the results predicted by the current popular deep learning models [49,50]: Long Short Term Memory (LSTM) model and the derivatives of LSTM. In this section, we demonstrate the prominent load prediction methodologies which we use in our analysis.
3.2.1. ARIMA
The Autoregressive Integrated Moving Average (ARIMA) is a classic statistical model. This model prompts to predict the potential future trends based on previous data, which has been widely used in forecasting financial trends [51] and epidemic trend [52]. As such, the time series predict result calculated based on the underlying formulation:
(2)
In the formula, is a constant representing the mean of the sequence , is the past ith value, and is past jth prediction error; and correspond to the coefficients of autoregressive model and moving average model, respectively. The premise of using the ARIMA model for prediction is that the time series data is stationarity or stationarity after d order differencing. Therefore, the user must analyze the time series in advance and adjust the appropriate p, d, and q parameters.
3.2.2. LSTM
Before introducing the LSTM model [53], it is still necessary to introduce the Recurrent Neural Network (RNN) [54]. RNN is a deep learning model dedicated to processing sequence data or contextual data, which has been widely used to deal with sequence prediction [55], speech recognition [56], and text generation [57] tasks. However, the cell structure of RNN is too simple, which makes it easy to cause the gradient to disappear, so it is only suitable for short-term memory.
Compared with RNN, LSTM has one more hidden transmission state, and it has been added from one neural network layer to four in a single cell. The newly added neural network layer has a forgetting gate layer that controls whether to discard the previous information, an input gate layer, and a remembering gate that determines what new information is added to the cell state, which supports LSTM to learn information that has a long-term dependency, i.e., long term time series data.
3.2.3. BiLSTM
From the above RNN and LSTM models, it can be concluded that these models take into account the influence of the former text of the data on the following text. However, in actual situations, the latter part of the data also has a certain relationship with the former text. For instance, in English grammar,
3.2.4. GRU
The Gated Recurrent Unit (GRU) model [58] is also a derivative model of the LSTM model, which has a simpler network structure than the LSTM model. GRU can save computational cost by reducing the parameters that need to be learned in structure, while it can also achieve the same performance as LSTM.
(3)
(4)
(5)
(6)
3.2.5. Evaluation Metrics of Load Prediction Methodologies
Notably, in this paper, we analyze all the aforementioned methodologies as discussed in Section 4 and finally select the best one for the autoscaling task. To reasonably evaluate the results of model prediction and pick the best one, we present a set of standard evaluation metrics which are commonly used in time series forecasting, as follows:
Mean squared error (MSE): It is the mean of the sum of squares of the errors between the true value and predicted value; we use it as a loss function to train our models.
Root mean squared error (RMSE): It is the arithmetic square root of MSE, which focus on judging the prediction error.
Mean absolute error (MAE): It is the mean of the absolute values of the errors between the true and predicted values.
R-Squared (): It is the square of the coefficients of multiple correlations between the actual values and predicted values. Notably, the closer is to 1, the better the model fits in general.
3.3. Literature Review: Customized Autoscaling in Kubernetes
In this section, we present the customized autoscaling strategies applied to K8s.
3.3.1. BiLSTM Based Autoscaling
As Deep Learning (DL) becomes popular, scholars have begun to try to implement deep learning models to customize autoscale. We observe that the authors in [26] propose a proactive scaling architecture based on BiLSTM. The process of scaling can be roughly summarized into two parts. The first part is the analysis phase, which applied the BiLSTM model to predict the upcoming HTTP workload. Then, the second part is the planning phase where the adaptation manager adjusts the number of Pods required according to the predicted traffic from the previous part. To evaluate the effectiveness of their proposed Proactive Custom Autoscaler (PCA), the authors used the
In the first experiment, the author chose Root Mean Square Error (RMSE) and prediction speed as the main evaluation methods. In addition to using the BiLSTM model, the famous ARIMA model is also used for comparison and testing the results of predicting one step and five steps, respectively. Judging from the results in Table 1, BiLSTM is better than ARIMA in terms of prediction accuracy and speed, since the
The second set of experiments is to take part of the continuous data of the NASA dataset as input, to test and compare Proactive PCA with BiLSTM and HPA with traditional K8s. Figure 4 illustrates that the number of Pods and resources scheduled by PCA fits well with the actual load. Compared with PCA, the shortcomings of allocation by HPA are obvious. First of all, HPA allocates many more resources than are needed, resulting in many resources not being used. Second, due to HPA’s
Summary: It is all good that the authors have achieved better results with BiLSTM compared with ARIMA. Notably, it is usual that prediction with BiLSTM should be better than with ARIMA since it autoregressive model. However, the issue with BiLSTM is that BiLSTM does not suit the time sequence data well. We can not have the future time sequence data, e.g., stock price tomorrow or one month later, as such we can not process the time sequence data bidirectionally well. In particular, BiLSTM suits well with text data, e.g., Named Entity Recognition (e.g., General Motors (General Motors, USA,
3.3.2. HPA+
Summary: HPA+ autoscaling engine proposes a nearly perfect solution, which is to implement a variety of different prediction models to complement each other. However, we believe that even if a good precision of prediction has been attained, a tremendous amount of computing resources are needed for forecasting.
3.3.3. Holt–Winters Exponential Smoothing on K8s VPA
The previous two research works have used various methods that modify the scaling decision in a custom autoscaler to improve the performance of HPA. On the other hand, this article [28] implements the Holt–Winters (HW) exponential smoothing algorithm and LSTM model to optimize VPA.
Before introducing the HW method, we introduce the
To test the performance of Holt–Winters method, the author collected historical data that present either seasonal and irregular on Alibaba containers. Similarly, the author also implemented the LSTM model to join the experiment for comparison. The experiment results shows that the HW model is well suited for seasonal requests. However, when the CPU request behaves irregularly, the HW model performs poorly.
Summary: As per the analysis in this subsection, HW model is to optimize VPA (Vertical Pod Autoscaler), to scale by reasonably allocating the CPU and memory of each Pod. On the one hand, in this paper, our objective is optimize the HPA. On the other hand, we observe that HW model works well for seasonal data, but performs poorly when the data behaves irregularly. To address this, the LSTM or GRU model suit well, no matter whether the data is seasonal or irregular, they can show better performance, which indicates that the LSTM or GRU has better robustness.
3.3.4. LIBRA Autoscaling
Currently, native HPA and VPA in K8s cannot run simultaneously that monitors CPU and memory metrics, in order to maximize their respective performance, this article proposed an autoscaler called
To test the performance of LIBRA, the author used
LIBRA can provide faster service than HPA from the perspective of threads: Single-threaded applications can simply increase the resource pool through horizontal scaling, while multi-threaded applications can effectively use the resources of multiple CPU cores through vertical scaling.
Summary: We observe that LIBRA can combine the HPA and VPA to provide more service capacity compared to the original HPA. However, as stated earlier, K8s open-source VPA is not mature enough and is in the experimental stage, so we solely focus on horizontal autoscaling. Another point is that VPA is mostly useful for resource allocation on demand or for scheduling a Pod to an appropriate node. Conversely, in this paper, we focus on satisfying the demand of multiple myriad requests of an application synchronously and in parallel with greater isolation.
3.3.5. Discussion
In this section, we demonstrate a set of autoscaling methods employed to K8s, such as BiLSTM based Autoscaling, HPA+, and Holt–Winters Exponential Smoothing on K8s VPA, and LIBRA Autoscaling. We observe that their working principles vary from method to method and also to validate their effectiveness; different sets of datasets and different sets of evaluation metrics are used across the methods. Thereby, it is not reasonable to show a comparative analysis among them. Therefore, we briefly present a summary of each method stating the merits and limitations.
4. Proposed Autoscaler: Load Prediction Model Selection
In this section, we delve into the selection of a load prediction model, which is the first step of the three steps of our customized autoscaling scheme. First, it is about the dataset and selection of the appropriate prediction objects. Thereafter, the performance evaluation of each prediction model. We use
4.1. Dataset
The dataset of this experiment uses historical data called
job_events
machine_attributes
machine_events
task_constrains
task_events
task_usage
The main dataset among them used in the experiment is
4.2. Preprocessing
After extracting the dataset, we can observe that the dataset (Figure 5) has a 300 s measurement period having the average CPU rate, memory rate, and machine IDs. Next, we randomly sample data from 100 machines of the raw dataset, where each row records the CPU rate and memory at 8352-time points. Finally, we save the processed data in a Pickle file (pkl format) to simulate the CPU load and memory load for training the network model.
Since we can only target one type of load as a scaling metric in K8s, we need to confirm whether different loads would affect each other, so we choose
4.3. Network Model Configuration
We train and compare LSTM, BiLSTM, and GRU models as experimental objects. Notably, Table 2 list the default parameters of the models. As we know, data science is about performing experiments, so we come with these parameters after several trials.
4.4. Experiment Results
In this section, we show the experimental analysis of the statistical analysis model, ARIMA, and the three deep learning models, LSTM, BiLSTM, and GRU. Notably, we assume the ARIMA model as the baseline model. For the statistical analysis model, we analyzed the original sequence in
For deep learning models, we used the CPU usage rate of one of the 100 randomly selected machines as the training set to train the models of LSTM, BiLSTM, and GRU with prediction steps from 1 to 50, respectively. From the loss graphs in Figure 7, Figure 8 and Figure 9, we observe that the convergence speed of the three models is very fast that need only 3 to 5 epochs to reach the lowest loss value. Therefore, in the next task, the number of epochs can be reduced to save training time. On the validation loss graphs, we can see that with the epoch increases, the loss of LSTM generates a small number of fluctuations; this fluctuation is more obvious in the BiLSTM model, while the GRU model has the most stable performance.
Subsequently, we input the data of 10 machines that are not repeated at random into the trained model for testing. The predicted and true values of the two models are obtained. Figure 10, Figure 11 and Figure 12 show the predicted and true values of each model; we can observe that LSTM model and BiLSTM has almost the same performance, while the GRU model has better performance in predicting load peaks. We can observe that it is difficult to distinguish the LSTM and BiLSTM model by the aforementioned figures.
To further compare the performance among the three models, we use the MSE evaluation metric as the model’s prediction accuracy. We take the average of the MSE value obtained from each step test of every 10 machines to eliminate abnormal values. This method produces the MSE comparison chart as shown in Figure 13. From the figure, we can say that the MSE decreases as the prediction step increases. From the data distribution point of view, the distribution of GRU is relatively stable and the prediction accuracy is better than other models, then LSTM followed by BiLSTM. The best results of LSTM, BiLSTM, and GRU models are
In addition, we also added the model’s training time and prediction time to the reference indicators. Figure 14 and Figure 15 compare the training time and prediction time of each step of the three models, respectively. As the prediction step size increases from 1 to 50, the training time and prediction time required for the three models are becoming longer in general. From the perspective of training time, the overall training time and growth speed of the GRU model are slightly lower than that of the LSTM model because of its simpler network structure. As the prediction step size increases, the BiLSTM model takes almost twice the training time of the GRU and LSTM models. From the perspective of prediction time, the prediction times for all three models are distributed from
Since the above experimental results are all statistics in Table 3, we can intuitively compare the experimental results of each model. From the perspective of prediction accuracy, the accuracy of each deep learning model is slightly better than that of traditional prediction models. In terms of average prediction time, the ARIMA model takes an average of 4.1441 s per prediction step, while each deep learning model takes around 0.06 s, which shows the result of the deep learning model is much better than traditional forecasting models. Therefore, deep learning models perform better than traditional forecasting models in terms of prediction accuracy and prediction time. Among them, the GRU is the best model in all aspects, and we encapsulate the model into the prediction logic for further experiments.
We know that the performance of GRU and LSTM does not vary a lot [64,65]. As we observe that except the training time, their performance is closely aligned. However, we know that GRU more simplified with less gates and less parameters than LSTM that makes it faster and simpler, however sometimes less adaptable and less efficient. On the other hand, since, LSTM has more logic gates and more parameters than GRU, it can add more flexibility and more expressiveness while incurring more computational cost, training time, and risk of overfitting. In fact, both the LSTM and GRU are alternatively used in time sequence data analysis and they perform more or less equally with respect to the nature of data [64,65].
5. Proposed Autoscaler: Deployment and Experimental Analysis
To be able to simulate the real K8s cluster for experiments under limited machines, we decide to build a cluster by creating virtual machines in local environment having one master node, and two worker nodes. Notably, we can add or remove nodes freely.
Here are the specific configurations:
Windows10 VirtualBox;
Virtual machines: Ubuntu 16.04 LTS;
Kubernetes installation source: Kubeadm (Creating a K8s cluster with Kubeadm
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/ , accessed on 10 January 2023);Kubernetes version: 1.20;
Docker version: 19.03.
5.1. Reactive (Default) Autoscaling in K8s (HPA in K8s)
After deploying the K8s cluster according to the above configuration, we created a simple yaml file and named it
Listing 8. Creation of a HPA having the Deployment for an |
Then waiting for the Deployment and Service to be created and become “Running” status, we can successfully access the deployed nginx network service by accessing the IP address of any worker node and corresponding port number. Now, we apply HPA by entering the command, as shown in the Listing 9.
Listing 9. Autoscaling of |
We set the upper limit of CPU usage to 10% to quickly trigger scaling and set the minimum and maximum number of Pods for each replica to 1 and 10, respectively, to represent the scale of scaling. Finally, we create a load generator image by executing the command, as shown in the Listing 10 to continuously generate requests for worker nodes to access the nginx page we just created to stress test the cluster.
Listing 10. Load generator for accessing the |
|
We run the command, as shown in the Listing 11 to observe the actual running state of HPA.
Listing 11. Getting/Fetching data about the HPA. |
|
Notably, we trace the running state of HPA, when the stress test starts at “8m” (at the 8th minute). We observe that HPA did not detect the surge of requests and did not scale the replicas respectfully until about 50s have passed. One more important point to note is that when the stress test ends at “9m” (at 9th minute), HPA takes more than 1 min to perform scale down. The log shown in Figure 16 illustrates that for a sudden increase in requests, there is still a considerable delay in HPA response.
5.2. Proactive Autoscaling in K8s (CPA in K8s)
In this section, we demonstrate the CPA framework, our CPA, and the specific packaging and deployment process of our CPA.
5.2.1. Introduction to CPA Framework
Custom Pod Autoscaler (CPA) framework [31] is a customizable K8s autoscaler developed by the DigitalOcean team (DigitalOcean
CPA frameworks provides developers with two stages for customizing the scaling logic: metrics gathering and evaluation. The job of the first stage metrics gathering is mainly used to collect metrics in K8s, and pass the user-defined metrics to be collected to the next stage in json format. It also works as an API that can be called to check the metrics usage of the target container or application at the current moment. The job of the evaluation stage is to pass the collected metrics as input to the scaling logic defined by the developer to decide whether to scale up or scale down to the target number of replicas.
The custom pod autoscaler operator (Custom Pod Autoscaler Operator
5.2.2. Workflow of Our Custom Pod Autoscaler
In this section, we show step by step how to write custom autoscaling logic, encapsulate the logic into a CPA image, and finally deploy proactive autoscaler to the cluster to achieve custom autoscaling.
The workflow of the two stages in CPA is shown in Figure 18, such as
1.. Read target metric and information from the K8s Metrics Server.
2(a).. Convert the current number of replicas and CPU usage to JSON format and pass it to the Evaluator.
2(b).. Collect and update the historical time sequence locally into the database.
3.. Load the deep learning model and read historical time sequence to Evaluator for predicting and calculating the replicas.
4.. Assign the target number of replicas calculated by Evaluator.
Next, we introduce the internal implementation logic of the Metric Gatherer and Evaluator in detail through two flowcharts as shown in Figure 19 and Figure 20.
The main job of the Metric Gatherer (Figure 19) is to collect information about the target application. In particular, it collects the Pod information running by the target application from the K8s metrics server, then extracts the current number of copies and the CPU usage of each Pod. Next, the total CPU usage is obtained by summing the CPU usage of each target Pod, and then the average CPU usage is obtained through . After that, it saves the current to the database and updates the historical time series , and finally converts and into JSON format and passes it to the Evaluator.
The Evaluator (Figure 20) is primarily responsible for executing the scaling logic. First, it receives the and passed from the Metrics Gatherer and reads the corresponding values from the JSON file, and then reads the historical time series and the pre-trained model from the database. Initialize the target CPU usage threshold and the target number of replicas to 50 and 0, respectively. Then judge whether the historical sequence length is greater than the input size required by the model. If so, is passed into as input to predict the average CPU usage for the next step (next period of time), and then execute the HPA algorithm to calculate the corresponding . If the time sequence does not meet the model input length, it then simply executes the original HPA logic that calculates the corresponding through and . Finally, it converts into JSON format and sends it to Deployment to execute the autoscaling strategy to the specified number of replicas.
5.2.3. Development and Shipment of CPA Image
In this section, we show the file structure and configuration required to build a CPA image. Notably, we have uploaded the CPA image building workflow to Github (CPA Image
Listing 12. File structure for the CPA image. |
|
|
|
|
|
|
|
|
The workflow of
Listing 13. Configuration for the CPA image ( |
|
|
|
|
|
- |
|
|
|
|
|
- |
- |
|
|
|
|
|
After the configuration file is set, start to package the
Listing 14. Building a Docker image for the our custom CPA and uploading it to public repository. |
|
|
5.2.4. Deployment of CPA Image and Testing on K8s
As mentioned in the previous subsection, the prerequisite for the operation of our CPA is the successful installation of the CPA operator, so the first step is to determine the version corresponding to K8s and the operator, and then enter the command to install, as shown in the Listing 15.
Listing 15. Installation of Custom Pod Autoscaler Operator. |
|
|
|
Next, we focus on configuring
Listing 16. Configuration file for deploying the CPA image while monitoring an application called |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
To generate a more stable load from the tested application, we chose the test image
It is convenient to quickly deploy HPA through the command line; however, it is inconvenient to modify the configuration information, so we also configured
Listing 17. Deployment of HPA and CPA, and Performance analysis. |
|
|
|
As in the previous HPA experiments, we trigger the target application to execute business logic by configuring
After deploying the load-generator, we captured 20 min of data to show the results. The process of scaling the number of replicas of
6. Conclusions and Future Work
Deploying applications through cloud computing services has become the choice of most users. One of the important functions is autoscaling, which is still implemented through reactive methods to trigger scaling logic and cannot meet the needs of all users respectfully. Therefore, it is necessary to analyze specific applications and customize the corresponding scaling strategy. To this, we begin with proposing a proactive scaling scheme that is based on the GRU deep learning model to address the shortcomings of the default autoscaler, HPA. In particular, we develop a load prediction model based on GRU for the autoscaler, then based on the predicted load, our custom autoscaler scales the replicas of a deployed application to meet the user demands. Respectively, we implement our custom autoscaling scheme, deploy it to the real K8s cluster, and empirically evaluate the effectiveness of it.
The paramount advantage of our scaling scheme is that it can train the model for each metric respectfully and replace the scaling logic at any time, which has better scalability. However, there are still aspects that need to be improved in our scheme. We develop our load prediction model in terms of CPU utilization in a node. In load prediction, it would be good if we could develop the model for individual tasks; however, this would increase the training complexity. In other aspect, while testing our custom pod autoscaler, we use a load generator to trigger and drive the test application to generate CPU load in K8s. This form of test method cannot generate custom target values duly, so this is our primary problem that needs to be solved in the future. In addition, all of the prediction models are based on the supervised learning method, which relies on models that are fully trained on target metrics, so we will consider other viable strategies such as reinforcement learning or other state-of-the-art methods as alternatives.
Conceptualization, S.K.M. and X.W.; Methodology, S.K.M., X.W., H.M.D.K., H.-N.D., K.N., H.Y. and T.W.; Software, S.K.M. and X.W.; Validation, S.K.M. and X.W.; Formal analysis, S.K.M., X.W., H.M.D.K., H.-N.D., K.N., H.Y. and T.W.; Investigation, S.K.M., X.W., H.M.D.K., H.-N.D. and K.N.; Resources, S.K.M. and X.W.; Data curation, S.K.M. and X.W.; Writing—original draft, S.K.M., X.W., H.M.D.K., H.-N.D., K.N., H.Y. and T.W.; Writing—review and editing, S.K.M., X.W., H.M.D.K., H.-N.D., K.N., H.Y. and T.W.; Visualization, S.K.M. and X.W.; Supervision, S.K.M.; Project administration, S.K.M.; Funding acquisition, S.K.M. All authors have read and agreed to the published version of the manuscript.
Not applicable.
Authors gratefully acknowledge funding sources. The authors also would like to thank the anonymous reviewers for their quality reviews and suggestions.
The authors declare no conflict of interest.
The following abbreviations are used in this manuscript:
AR | AutoRegressive |
ARIMA | AutoRegressive Integrated Moving Average |
BiLSTM | Bi-directional Long Short-Term Memory |
CA | Cluster Autoscaler |
CPA | Custom Pod Autoscaler |
DL | Deep Learning |
GRU | Gated Recurrent Unit |
HPA | Horizontal Pod Autoscaler |
HTM | Hierarchical Temporal Memory |
K8s | Kubernetes |
LSTM | Long Short-Term Memory |
MAE | Mean Absolute Error |
MMPP | Markov-Modulated Poisson Process |
MSE | Mean Squared Error |
OS | Operating System |
RL | Reinforcement Learning |
RMSE | Root Mean Squared Error |
SVC | SerViCe |
VM | Virtual Machine |
VPA | Vertical Pod Autoscaler |
Footnotes
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Figure 2. Hierarchical structure of Deployment, ReplicaSet, and Pod (adapted from official documentation of Kubernetes (https://kubernetes.io/docs/concepts/workloads/controllers/, accessed on 10 January 2023)).
Figure 3. HPA autoscaling process (adapted from official documentation of Kubernetes) [33].
Figure 5. Randomly select a machine and extract the CPU rate and memory rate at each time point; each time point is separated by 300 s.
Figure 21. Scaling result of HPA and CPA on K8s. In the figure, the blue line represents the expansion result of the CPA, and the red line represents the expansion result of the HPA. Both types of autoscalers execute the autoscaling logic every 15 s.
Experimental results on NASA dataset.
Model Type | ARIMA | BiLSTM | ARIMA | BiLSTM |
---|---|---|---|---|
1 Step | 1 Step | 5 Steps | 5 Steps | |
MSE | 196.288 | 183.642 | 237.604 | 207.313 |
RMSE | 14.010 | 13.551 | 15.414 | 14.39 |
MAE | 10.572 | 10.280 | 11.628 | 10.592 |
|
0.692 | 0.712 | 0.628 | 0.675 |
Prediction-speed (ms) | 2299 | 4.3 | 2488 | 45.1 |
The best values are marked as bold.
Configuration of each model.
|
|
|
|
---|---|---|---|
hidden unit | 50 | 100 | 50 |
Activation function | relu | relu | relu |
batch size | 512 | 512 | 512 |
Epochs | 200 | 200 | 200 |
Optimizer | adam | adam | adam |
Loss function | MSE | MSE | MSE |
Experiment result on Clusterdata-2011-2.
|
|
|
|
|
---|---|---|---|---|
|
- | 47 Step | 45 Step | 24 Step |
MSE | 0.00197 | 0.00195 | 0.00195 |
|
RMSE | 0.04429 | 0.04367 | 0.04369 |
|
MAE | 0.03202 | 0.03274 | 0.03101 |
|
Training Time (s) | - | 1.44 | 2.41 | 0.75 |
Prediction Time (s) | 4.1441 | 0.0661 | 0.0665 | 0.0651 |
References
1. Chiueh, S.N.T.C.; Brook, S. A survey on virtualization technologies. Rpe Rep.; 2005; 142, pp. 1-42.
2. Uhlig, R.; Neiger, G.; Rodgers, D.; Santoni, A.L.; Martins, F.C.; Anderson, A.V.; Bennett, S.M.; Kagi, A.; Leung, F.H.; Smith, L. Intel virtualization technology. Computer; 2005; 38, pp. 48-56. [DOI: https://dx.doi.org/10.1109/MC.2005.163]
3. Mao, M.; Humphrey, M. A performance study on the vm startup time in the cloud. Proceedings of the 2012 IEEE 5th International Conference on Cloud Computing; Honolulu, HI, USA, 24–29 July 2012; pp. 423-430.
4. Xavier, M.G.; Neves, M.V.; Rossi, F.D.; Ferreto, T.C.; Lange, T.; De Rose, C.A. Performance evaluation of container-based virtualization for high performance computing environments. Proceedings of the 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing; Belfast, UK, 27 February–1 March 2013; pp. 233-240.
5. Soltesz, S.; Pötzl, H.; Fiuczynski, M.E.; Bavier, A.; Peterson, L. Container-based operating system virtualization: A scalable, high-performance alternative to hypervisors. Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007; Lisbon, Portugal, 21–23 March 2007; pp. 275-287.
6. Anderson, C. Docker [software engineering]. IEEE Softw.; 2015; 32, 102-c3. [DOI: https://dx.doi.org/10.1109/MS.2015.62]
7. Burns, B.; Grant, B.; Oppenheimer, D.; Brewer, E.; Wilkes, J. Borg, omega, and kubernetes. Queue; 2016; 14, pp. 70-93. [DOI: https://dx.doi.org/10.1145/2898442.2898444]
8. Truyen, E.; Van Landuyt, D.; Preuveneers, D.; Lagaisse, B.; Joosen, W. A comprehensive feature comparison study of open-source container orchestration frameworks. Appl. Sci.; 2019; 9, 931. [DOI: https://dx.doi.org/10.3390/app9050931]
9. Naik, N. Building a virtual system of systems using docker swarm in multiple clouds. Proceedings of the 2016 IEEE International Symposium on Systems Engineering (ISSE); Edinburgh, UK, 3–5 October 2016; pp. 1-3.
10. Guerrero, C.; Lera, I.; Juiz, C. Resource optimization of container orchestration: A case study in multi-cloud microservices-based applications. J. Supercomput.; 2018; 74, pp. 2956-2983. [DOI: https://dx.doi.org/10.1007/s11227-018-2345-2]
11. Zadka, M.; Zadka, M. Salt Stack. DevOps in Python: Infrastructure as Python; Apress: New York, NY, USA, 2019; pp. 121-137.
12. Acuña, P. Amazon EC2 container service. Deploying Rails with Docker, Kubernetes and ECS; Springer: Cham, Switzerland, 2016; pp. 69-98.
13. Pousty, S.; Miller, K. Getting Started with OpenShift: A Guide for Impatient Beginners; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2014.
14. Lossent, A.; Peon, A.R.; Wagner, A. PaaS for web applications with OpenShift Origin. J. Phys. Conf. Ser.; 2017; 898, 082037. [DOI: https://dx.doi.org/10.1088/1742-6596/898/8/082037]
15. Mondal, S.K.; Pan, R.; Kabir, H.; Tian, T.; Dai, H.N. Kubernetes in IT administration and serverless computing: An empirical study and research challenges. J. Supercomput.; 2022; 78, pp. 2937-2987. [DOI: https://dx.doi.org/10.1007/s11227-021-03982-3]
16. Ferreira, A.P.; Sinnott, R. A performance evaluation of containers running on managed kubernetes services. Proceedings of the 2019 IEEE International Conference on Cloud Computing Technology and Science (CloudCom); Sydney, Australia, 11–13 December 2019; pp. 199-208.
17. Sayfan, G. Mastering Kubernetes; Packt Publishing Ltd.: Birmingham, UK, 2017.
18. Carrión, C. Kubernetes scheduling: Taxonomy, ongoing issues and challenges. ACM Comput. Surv.; 2022; 55, pp. 1-37. [DOI: https://dx.doi.org/10.1145/3539606]
19. Brewer, E.A. Kubernetes and the path to cloud native. Proceedings of the 6th ACM Symposium on Cloud Computing; Kohala Coast, HI, USA, 27–29 August 2015; 167.
20. Vayghan, L.A.; Saied, M.A.; Toeroe, M.; Khendek, F. Deploying microservice based applications with kubernetes: Experiments and lessons learned. Proceedings of the 2018 IEEE 11th International Conference on Cloud Computing (CLOUD); San Francisco, CA, USA, 2–7 July 2018; pp. 970-973.
21. Zhang, H.; Jiang, G.; Yoshihira, K.; Chen, H.; Saxena, A. Intelligent workload factoring for a hybrid cloud computing model. Proceedings of the 2009 Congress on Services-I; Los Angeles, CA, USA, 6–10 July 2009; pp. 701-708.
22. Moore, L.R.; Bean, K.; Ellahi, T. Transforming reactive auto-scaling into proactive auto-scaling. Proceedings of the 3rd International Workshop on Cloud Data and Platforms; Prague, Czech Republic, 14–17 April 2013; pp. 7-12.
23. Al-Dhuraibi, Y.; Paraiso, F.; Djarallah, N.; Merle, P. Autonomic vertical elasticity of docker containers with elasticdocker. Proceedings of the 2017 IEEE 10th International Conference on Cloud Computing (CLOUD); Honolulu, HI, USA, 25–30 June 2017; pp. 472-479.
24. Morais, F.J.A.; Brasileiro, F.V.; Lopes, R.V.; Santos, R.A.; Satterfield, W.; Rosa, L. Autoflex: Service agnostic auto-scaling framework for iaas deployment models. Proceedings of the 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing; Delft, The Netherlands, 13–16 May 2013; pp. 42-49.
25. Imdoukh, M.; Ahmad, I.; Alfailakawi, M.G. Machine learning-based auto-scaling for containerized applications. Neural Comput. Appl.; 2020; 32, pp. 9745-9760. [DOI: https://dx.doi.org/10.1007/s00521-019-04507-z]
26. Dang-Quang, N.M.; Yoo, M. Deep Learning-Based Autoscaling Using Bidirectional Long Short-Term Memory for Kubernetes. Appl. Sci.; 2021; 11, 3835. [DOI: https://dx.doi.org/10.3390/app11093835]
27. Toka, L.; Dobreff, G.; Fodor, B.; Sonkoly, B. Machine learning-based scaling management for kubernetes edge clusters. IEEE Trans. Netw. Serv. Manag.; 2021; 18, pp. 958-972. [DOI: https://dx.doi.org/10.1109/TNSM.2021.3052837]
28. Wang, T. Predictive Vertical CPU Autoscaling in Kubernetes Based on Time-Series Forecasting with Holt-Winters Exponential Smoothing and Long Short-Term Memory. 2021; Available online: http://www.diva-portal.org/smash/record.jsf?pid=diva2%3A1553841&dswid=-8736 (accessed on 10 January 2023).
29. Yan, M.; Liang, X.; Lu, Z.; Wu, J.; Zhang, W. HANSEL: Adaptive horizontal scaling of microservices using Bi-LSTM. Appl. Soft Comput.; 2021; 105, 107216. [DOI: https://dx.doi.org/10.1016/j.asoc.2021.107216]
30. Biran, O.; Breitgand, D.; Lorenz, D.; Masin, M.; Raichstein, E.; Weit, A.; Iyoob, I. Heterogeneous resource reservation. Proceedings of the 2018 IEEE International Conference on Cloud Engineering (IC2E); Orlando, FL, USA, 17–20 April 2018; pp. 141-147.
31. Thompson, J. Custom Pod Autoscaler. Available online: https://github.com/jthomperoo/custom-pod-autoscaler (accessed on 6 January 2023).
32. Kubernetes Architecture and Concepts. Available online: https://platform9.com/blog/kubernetes-enterprise-chapter-2-kubernetes-architecture-concepts/ (accessed on 10 January 2023).
33. Kubernetes. How Does a HorizontalPodAutoscaler Work?. Available online: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/ (accessed on 6 January 2023).
34. Kubernetes. Kubernetes Metrics Server. Available online: https://github.com/kubernetes-sigs/metrics-server/ (accessed on 6 January 2023).
35. Kubernetes. Vertical Pod Autoscaler. Available online: https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler (accessed on 6 January 2023).
36. Kubernetes. Cluster Autoscaler. Available online: https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler (accessed on 6 January 2023).
37. Padgham, L.; Winikoff, M. Prometheus: A methodology for developing intelligent agents. Proceedings of the International Workshop on Agent-Oriented Software Engineering; Bologna, Italy, 15 July 2002; pp. 174-185.
38. Ifrah, S. Deploying Containerized Applications with Amazon ECS. Deploy Containers on AWS; Springer: Cham, Switzerland, 2019; pp. 83-133.
39. Aly, M.; Khomh, F.; Yacout, S. Kubernetes or openShift? Which technology best suits eclipse hono IoT deployments. Proceedings of the 2018 IEEE 11th Conference on Service-Oriented Computing and Applications (SOCA); Paris, France, 20–22 November 2018; pp. 113-120.
40. Al Jawarneh, I.M.; Bellavista, P.; Bosi, F.; Foschini, L.; Martuscelli, G.; Montanari, R.; Palopoli, A. Container orchestration engines: A thorough functional and performance comparison. Proceedings of the ICC 2019–2019 IEEE International Conference on Communications (ICC); Shanghai, China, 20–24 May 2019; pp. 1-6.
41. Cérin, C.; Menouer, T.; Saad, W.; Abdallah, W.B. A new docker swarm scheduling strategy. Proceedings of the 2017 IEEE 7th International Symposium on Cloud and Service Computing (SC2); Kanazawa, Japan, 22–25 November 2017; pp. 112-117.
42. Soppelsa, F.; Kaewkasi, C. Native Docker Clustering with Swarm; Packt Publishing Ltd.: Birmingham, UK, 2016.
43. Martyshkin, A.; Biktashev, R. Research and Analysis of Computing Cluster Configuration Management Systems. Proceedings of the Advances in Automation IV: International Russian Automation Conference, RusAutoCon2022; Sochi, Russia, 4–10 September 2022; pp. 194-205.
44. Wågbrant, S.; Dahlén Radic, V. Automated Network Configuration: A Comparison between Ansible, Puppet, and SaltStack for Network Configuration. 2022; Available online: www.diva-portal.org/smash/record.jsf?pid=diva2%3A1667034&dswid=944 (accessed on 6 January 2023).
45. Čilić, I.; Krivić, P.; Podnar Žarko, I.; Kušek, M. Performance Evaluation of Container Orchestration Tools in Edge Computing Environments. Sensors; 2023; 23, 4008. [DOI: https://dx.doi.org/10.3390/s23084008] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/37112349]
46. Mondal, S.K.; Tan, T.; Khanam, S.; Kumar, K.; Kabir, H.M.D.; Ni, K. Security Quantification of Container-Technology-Driven E-Government Systems. Electronics; 2023; 12, 1238. [DOI: https://dx.doi.org/10.3390/electronics12051238]
47. Parmar, K.S.; Bhardwaj, R. Water quality management using statistical analysis and time-series prediction model. Appl. Water Sci.; 2014; 4, pp. 425-434. [DOI: https://dx.doi.org/10.1007/s13201-014-0159-9]
48. Wang, Y.W.; Shen, Z.Z.; Jiang, Y. Comparison of ARIMA and GM (1, 1) models for prediction of hepatitis B in China. PLoS ONE; 2018; 13, e0201987. [DOI: https://dx.doi.org/10.1371/journal.pone.0201987]
49. Kumar, S.; Hussain, L.; Banarjee, S.; Reza, M. Energy load forecasting using deep learning approach-LSTM and GRU in spark cluster. Proceedings of the 2018 5th International Conference on Emerging Applications of Information Technology (EAIT); West Bengal, India, 12–13 January 2018; pp. 1-4.
50. Yadav, A.; Jha, C.; Sharan, A. Optimizing LSTM for time series prediction in Indian stock market. Procedia Comput. Sci.; 2020; 167, pp. 2091-2100. [DOI: https://dx.doi.org/10.1016/j.procs.2020.03.257]
51. Ariyo, A.A.; Adewumi, A.O.; Ayo, C.K. Stock price prediction using the ARIMA model. Proceedings of the 2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation; Cambridge, UK, 26–28 March 2014; pp. 106-112.
52. Benvenuto, D.; Giovanetti, M.; Vassallo, L.; Angeletti, S.; Ciccozzi, M. Application of the ARIMA model on the COVID-2019 epidemic dataset. Data Brief; 2020; 29, 105340. [DOI: https://dx.doi.org/10.1016/j.dib.2020.105340]
53. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput.; 1997; 9, pp. 1735-1780. [DOI: https://dx.doi.org/10.1162/neco.1997.9.8.1735]
54. Jordan, M.I. Serial order: A parallel distributed processing approach. Advances in Psychology; Elsevier: Amsterdam, The Netherlands, 1997; Volume 121, pp. 471-495.
55. Bengio, S.; Vinyals, O.; Jaitly, N.; Shazeer, N. Scheduled sampling for sequence prediction with recurrent neural networks. arXiv; 2015; arXiv: 1506.03099
56. Graves, A.; Mohamed, A.R.; Hinton, G. Speech recognition with deep recurrent neural networks. Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing; Vancouver, BC, Canada, 26–31 May 2013; pp. 6645-6649.
57. Hu, Z.; Shi, H.; Tan, B.; Wang, W.; Yang, Z.; Zhao, T.; He, J.; Qin, L.; Wang, D.; Ma, X. et al. Texar: A modularized, versatile, and extensible toolkit for text generation. arXiv; 2018; arXiv: 1809.00794
58. Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv; 2014; arXiv: 1406.1078
59. Rajabi, A.; Wong, J.W. MMPP characterization of web application traffic. Proceedings of the 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems; Washington, DC, USA, 7–9 August 2012; pp. 107-114.
60. Balla, D.; Simon, C.; Maliosz, M. Adaptive scaling of Kubernetes pods. Proceedings of the NOMS 2020-2020 IEEE/IFIP Network Operations and Management Symposium; Budapest, Hungary, 20–24 April 2020; pp. 1-5.
61. Shen, H.; Hong, X. Host Load Prediction with Bi-directional Long Short-Term Memory in Cloud Computing. arXiv; 2020; arXiv: 2007.15582
62. Sun, Y.; Chen, X.; Liu, D.; Tan, Y. Power-aware virtual machine placement for mobile edge computing. Proceedings of the 2019 International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData); Atlanta, GA, USA, 14–17 July 2019; pp. 595-600.
63. Benesty, J.; Chen, J.; Huang, Y.; Cohen, I. Pearson correlation coefficient. Noise Reduction in Speech Processing; Springer: Cham, Switzerland, 2009; pp. 1-4.
64. Fu, R.; Zhang, Z.; Li, L. Using LSTM and GRU neural network methods for traffic flow prediction. Proceedings of the 2016 31st Youth Academic Annual Conference of Chinese Association of Automation (YAC); Wuhan, China, 11–13 November 2016; pp. 324-328.
65. Yamak, P.T.; Yujian, L.; Gadosey, P.K. A comparison between arima, lstm, and gru for time series forecasting. Proceedings of the 2019 2nd International Conference on Algorithms, Computing and Artificial Intelligence; Sanya, China, 20–22 December 2019; pp. 49-55.
66. Nginx. Nginx Unit: Dynamic Application Server. Available online: https://www.nginx.com/products/nginx-unit (accessed on 6 January 2023).
67. Thompson, J. Custom Pod Autoscaler Operator. Available online: https://github.com/jthomperoo/custom-pod-autoscaler-operator (accessed on 6 January 2023).
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Most enterprise customers now choose to divide a large monolithic service into large numbers of loosely-coupled, specialized microservices, which can be developed and deployed separately. Docker, as a light-weight virtualization technology, has been widely adopted to support diverse microservices. At the moment, Kubernetes is a portable, extensible, and open-source orchestration platform for managing these containerized microservice applications. To adapt to frequently changing user requests, it offers an automated scaling method, Horizontal Pod Autoscaler (HPA), that can scale itself based on the system’s current workload. The native reactive auto-scaling method, however, is unable to foresee the system workload scenario in the future to complete proactive scaling, leading to QoS (quality of service) violations, long tail latency, and insufficient server resource usage. In this paper, we suggest a new proactive scaling scheme based on deep learning approaches to make up for HPA’s inadequacies as the default autoscaler in Kubernetes. After meticulous experimental evaluation and comparative analysis, we use the Gated Recurrent Unit (GRU) model with higher prediction accuracy and efficiency as the prediction model, supplemented by a stability window mechanism to improve the accuracy and stability of the prediction model. Finally, with the third-party custom autoscaling framework, Custom Pod Autoscaler (CPA), we packaged our custom autoscaling algorithm into a framework and deployed the framework into the real Kubernetes cluster. Comprehensive experiment results prove the feasibility of our autoscaling scheme, which significantly outperforms the existing Horizontal Pod Autoscaler (HPA) approach.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details


1 School of Computer Science and Engineering, Macau University of Science and Technology, Taipa, Macau 999078, China;
2 Deakin University, Geelong, VIC 3216, Australia;
3 Department of Computer Science, Hong Kong Baptist University, Hong Kong, China;
4 Software Engineering Institute, East China Normal University, Shanghai 200062, China;