An optimized resource allocation in cloud using prediction enabled reinforcement learning

Abstract

Due to its many applications, cloud computing has gained popularity in recent years. It is simple and fast to access shared resources at any time from any location. Cloud-based package facilities need adaptive resource allocation (RA) to provide Quality-of-Service (QoS) while lowering resource prices owing to workloads and service demands that change over time. As a result of the constantly shifting system states, resource allocation presents enormous challenges. The old methods often require specialist knowledge, which may result in poor adaptability. Additionally, it aims for environments with set workloads; hence, it cannot be used successfully in real-world contexts with fluctuating workloads. This research therefore proposes a Prediction-enabled feedback system to solve these significant problems with the reinforcement learning-based RA (PCRA) framework. Firstly, this research creates a more accurate Q-value prediction to forecast management value processes at various scheme conditions, using Q-values as the basis. For accurate Q-value prediction, the model makes use of several prediction learners using the Q-learning method. Also, an improved optimization-based algorithm is utilized to discover impartial resource allocations called the Feature Selection Whale Optimization Algorithm (FSWOA). Simulations based on practical scenarios using CloudStack and RUBiS benchmarks demonstrate the effectiveness of PCRA for real-time RA. Simulations demonstrate that the PCRA framework achieves a 94.7% Q-value prediction accuracy and reduces SLA violations and resource cost by 17.4% compared to traditional round-robin scheduling.

Full text

Translate

Turn on search term navigation

Introduction

The most prevalent ways that cloud platform resources are used are for data processing and storage. Distributed cloud resources provide the benefits of flexible scalability. Therefore, the greatest tool for cutting the costs in data processing is an effective structure that optimizes the data flow while preserving the knowledge of the Service Level Agreement (SLA)^1,2. Cloud-based IoT refers to a system of physical objects that can be analyzed and managed online to create various innovative schemes. The primary technical issue in service computing is quickly mixing a variety of services to facilitate cross-structural business operations³.

A major issue in cloud computing is plan development, which distributes workflow tasks to Virtual Machines (VMs) based on a variety of functional as well as non-functional needs. It is difficult to design a perfect schedule for workflow scheduling since it is an NP-hard optimization problem^{4, 5–6}. Lately, there has been an increase in attention in using distributed models to solve issues in cloud computing environments, mainly RA. The two main methods for doing this are task scheduling, in which the cloud provider allocates workloads to VMs, as well as VM-to-Physical Machine mapping⁷.

A cloud-edge cooperative content-delivery method is proposed in asymmetrical Internet of Vehicles (IoV) cloud environments to tackle resource allocation⁸. Previous research introduced two reliability estimation methods: a whale optimization strategy along with a neural network-based two classifier. The Cloud computing infrastructure offers a real-time computer setting for e-health care services⁹. The Improved Genetic-Shuffled Frog-Leaping Algorithm (IGSFLA) is a method that creates the best first frog (individual) in a group by using the experiential optimal-insert strategy with suitability constraints. The border process is practical for both the subcategory as well as the whole group in order to prevent local optimal answers as well as speed up growth¹⁰.

The concepts of natural selection and evolution serve as the foundation for the iterative stochastic optimization methods referred to as genetic Algorithms (GAs). Numerous disciplines make substantial use of the Shuffled Frog Leaping Algorithm (SFLA), which boasts the benefits of humble application, quick speed convergence, and the ability for global optimization¹¹. Prior works such as¹² have demonstrated the potential of reinforcement learning for dynamic resource allocation, motivating the real-time adaptability in the proposed design. The proposed Q-value (Q-v) prediction model, which combines the Q-learning algorithm¹³ as well as several Machine Learning (ML)-based prediction learners, provides a response switch to quickly identify impartial RA schemes in the cloud setting. Because of the fluctuating workloads and service demands in real-world cloud-based software service situations, these traditional techniques are unable to adapt effectively. To address the dynamic and real-time resource allocation challenges in cloud computing, this research proposes a Prediction-enabled Cloud Resource Allocation (PCRA) framework. The major contributions of this work include the following:

A real-time feedback-based Q-learning mechanism is given that adapts allocation decisions based on live cloud metrics.
The combination of multiple machine learning models is done, such as the support vector machine (SVM), regression tree (RT), and K-Nearest Neighbor (KNN), for accurate Q-value prediction across diverse workload patterns.
This research uses a Whale Optimization Algorithm (FSWOA) for feature selection to enhance prediction accuracy through offline optimization.
Evaluation of the PCRA framework in a real cloud environment is done using the RUBiS benchmark and validates its effectiveness under real-world conditions.
These contributions distinguish PCRA from traditional static or batch-mode scheduling models and support the overall objective of enabling intelligent, real-time cloud resource allocation.

The remainder of this research organization is as follows: The related works are examined in Sect. 2. Section 3 thoroughly covers the RA issue for cloud-based services and the PCRA framework. Section 4 presents both the performance assessment and analysis. Finally, this effort in Sect. 5 comes to a conclusion.

Literature review

This section reviews approximately half of the new works related to resource allocation in cloud its advantages and limitations.

Asghari et al.¹⁴ proposed a way to combine IoT-based cloud services that protects privacy. They also came up with SFLA-GA, a special kind of evolutionary algorithm that combines SFLA and GA, as a way to improve quality of service in an IoT setting while also protecting privacy. The recommended approach maximizes the service composition’s suitability value by combining numerous QoS criteria. To help customers choose the finest composite service, operators are also divided into groups based on the degree of privacy protection that they value.

Amini Motlagh et al.¹⁵ discussed some concepts regarding cloud computing. Over the last 10 years, heterogeneous cloud computing environments, such as heterogeneous computing (HC) systems, have grown significantly in size. As a result, network disappointments are inevitable in such systems and have an effect on system dependability. Meanwhile, the work scheduling method in HC is stimulating, and this research investigates a reliability-aware task scheduling algorithm (RATSA). As evolutionary algorithms, RATSA plans tasks on Directed Acyclic Graphs (DAGs) using the SFLA and GA. The NP-complete problem of maximizing make span in the RATSA is resolved using the population-based SFLA-GA.

Gola et al.¹⁶ discussed how the right allocation of various resources on VMs in a cloud computing setting will reduce makespan while also improving resource utilization. This work builds a unique hierarchical RA technique based multi-objective hybrid capuchin search with a genetic algorithm (MHCSGA). Throughput, Resource utilization, makespan, response time, as well as execution time of multi-objective functions are all improved by utilizing MHCSGA. Initially this work uses partitioning based on the K-medoids clustering technique to assign the resources as efficiently as possible. The clustering process separates the responsibilities into two cluster groups, after which optimization determines the optimal resource allocation strategy.

Durgadevi et al.¹⁷ introduced a hybridized optimization method that combines the SFLA with the “Cuckoo Search” (CS) approach, which is proposed in the current work. The shortcomings of earlier research, including the Krill herd algorithm, the Grouped Task Scheduling (GTS) algorithm job, and the Hybrid Artificial Bee Colony and Cuckoo Search (HABCCS) algorithm, are addressed by the present approach. Additionally, it combines the compensations of SFLA as well as CS. In this method, the SFLA section completes the operations mentioned above, such as setting the generating requests, initial request size, calculating the fitness value of the sorting, dividing, SFLA as well as evaluating user requests. A variety of industries extensively utilize the SFLA, which offers the advantages of quicker convergence and easier implementation. It is also capable of global optimization.

Pani et al.¹⁸ propose Democratic Grey Wolf Optimization (DGWO) as a strategy for successfully overcoming the constraints on resource allocation. After resetting the generating requests and request size, as well as estimating the DGWO sorts, DGWO fitness value, divides, as well as evaluates the user’s requirements. The compensations of DGWO include easier implementation, global optimization capabilities, and faster convergence. With significant local optimization avoidance, the DGWO works well in unexplored, challenging search areas.

Xu et al.¹⁹ demonstrate how to assign and organize tasks using the Multi-Objective Shuffled Frog-Leaping Algorithm (MOSFLA) and GA for Multi-Unmanned Aerial Vehicles (multi-UAV) herbal defense process optimization. Khaleel et al.²⁰ provide a hybrid multi-criteria decision-making (Hybrid-MCD) method to simultaneously enhance workflow effectiveness and scheduling accuracy. It frames the issue as a dual-objective task scheduling problem, improving scheduling precision while reducing the proportion of workflow tasks that take up computer resources in terms of service delivery time. This research used the Deadline-aware Stepwise Reliability Optimization (DARO) method. It improves the application’s reliability as well as execution time by changing the reliability-recursive maximizing technique and reallocating plan requests that are not on the essential path.

Jelodari et al.²¹ built an infrastructure to handle connected things. Cloud computing (CC) has become crucial for IoT data analysis and storage. This article discusses cloud brokers, who act as middlemen in the cloud computing system that controls networked devices. By considering an optimization problem, the agent’s income as well as system obtainability are increased, while the request response time as well as energy consumption are decreased. The Black Widow Optimization (BWO) algorithm is used to solve an objective function, after which it is proposed. Though several approaches are proposed in the existing methods to efficiently assign the resources in the cloud, there are still some issues with efficient resource allocation. Hence, this research reliably anticipates the Q-v of management processes in a cloud context using Q-learning along with many ML-grounded methods. A novel optimization response control-based methodology is then utilized to effectively determine the objective RA plans for cloud-based services based on the projected Q-v.

Several prediction-enabled models for cloud resource allocation have been proposed in the literature. For example⁸, used deep reinforcement learning for IoT-edge-cloud distribution, and²⁰ explored hybrid workflow placement with MCD. Unlike these, the proposed model influences explainable and efficient learners like SVM, RT, and KNN, combined with Q-learning for real-time decisions. Additionally, cloud simulation environments are often inadequately represented in existing works, while the proposed study addresses this using CloudStack and the RUBiS workload emulator.

Table 1. Comparison with related works.

Author/year	Technique used	Adaptivity	Optimization	Real cloud evaluation	Proposed contribution
Cui et al. (2023) [8]	Deep RL	✓✓	✗	✗	✗
Xu et al. (2020) [19]	Hybrid MOSFLA-GA	✗	✓	✗	✗
Riahi & Faris (2023) [12]	Multi-agent RL	✓✓	✗	✓	✗
Proposed work	Q-learning + FSWOA	✓✓✓	✓✓	✓✓	✓✓✓

In Table 1, ✓ marks are used to indicate the level of capability. A single ✓ indicates basic support, ✓✓ reflects moderate or partial integration, while ✓✓✓ indicates full and advanced support. For example, the proposed work scores ✓✓✓ in adaptivity due to its real-time feedback-based Q-learning and forecasting mechanism. Prior works either lacked runtime response or had static prediction models. Compared to existing works, the PCRA framework uniquely combines predictive learning, offline optimization, and a live feedback loop. Unlike models that rely solely on static ML predictions or batch RL, PCRA supports real-time decision-making, validated through deployment on CloudStack.

Proposed methodology

Moving beyond VMS for cloud scheduling

Efficient scheduling is crucial in cloud environments with various interacting resources to provide fault tolerance, scalability, and optimum system performance. There are other considerations than scheduling virtual machines (VMs) that are important for managing cloud infrastructure. To be more specific, replication techniques, data and control plane storage, and other similar components are essential for cloud systems to operate properly. This section delves into the intricacies of scheduling that go beyond virtual machines (VMs), highlighting the need of taking a comprehensive approach to managing resources in cloud environments.

Cloud scheduling and the function of storage

When it comes to cloud computing, scheduling compute and storage resources is equally crucial. To make sure data is easily accessible and spread out among various storage devices or regions, data plane scheduling entails controlling the allocation and access to storage resources. This necessitates keeping latency to a minimum, balancing loads, and controlling I/O needs, all while keeping availability high. Just as how control plane scheduling is crucial for the efficient functioning of cloud systems like OpenStack and Kubernetes, it also guarantees the proper placement of management nodes and services like monitoring and orchestration.

Replication scheduling

The availability of data in the face of failures is guaranteed by replication, another essential component of cloud storage systems. Determining when and where to duplicate data across various nodes or data centers is an important part of replication scheduling considerations. To reduce resource contention and keep system efficiency high, the replication process and VM scheduling must be properly synchronized. The performance of the cloud infrastructure is not impacted by duplicated data due to efficient scheduling.

Cloud system scheduling difficulty

Resources are constantly assigned and reallocated in cloud systems, which are dynamic. Virtual machines (VMs), storage, and replication all rely on one another; thus, scheduling has to deal with all three. For example, the cloud system has to think about the availability of the data stored with a virtual machine (VM) and whether or not it has to be duplicated when scheduling the VM on a physical host. Due to these interconnections, scheduling in cloud environments is an extremely difficult problem that calls for algorithms that can effectively balance the needs for computation, storage, and replication.

The proposed PCRA framework is presented in this section. The PCRA may be efficiently utilized to produce RA plans for cloud-based applications that are adaptable as well as effective. The PCRA framework is shown in Figure. 1. The historical datasets include runtime information from various system stages, such as the present workload, the RA plan with QoS, and the matching goal RA plan. The Q-learning technique is then utilized to assess the worth of management decisions made under various system conditions (via Q-v). More precisely, management actions are to add or delete VMs of various sorts, along with the associated incentives that may be obtained when objective RA plans are identified. The system state is made up of the present assignment as well as the RA plan in the runtime setting.

Fig. 1 [Images not available. See PDF.]

PCRA framework showing real-time decision-making and feedback loop for dynamic resource allocation.

First, the management experience is used to preprocess the Q-v of management activities in the Q-v table. ML-based algorithms may then be trained using the pre-processed Q-v to create the Q-v optimization model. The Q-v forecast system is trained using three ML-based prediction algorithms, like SVM, RT, and KNN, and the model with the greatest accuracy is chosen. As a result, by entering the present system state, such as the assignment as well as RA plan, with QoS, it is possible to anticipate the Q-v of various management movements with accuracy. A Q-v prediction model was made so that the Q-v of diverse management actions could be projected at runtime based on the present assignment, the strategy for allocating resources, and the QoS that matched. By contrasting their matching Q-v, the runtime judgments of selecting MO are then carried out. Using an optimization-based approach, objective resource allocation strategies may eventually be discovered²².

Figure 1 depicts a dynamic, feedback-based RA framework where Q-learning continually updates the Q-values in response to runtime metrics (workload, SLA, and RA states). The loop ensures adaptive decision-making at each time step, simulating real-time deployment. The model is not static—it continuously monitors and retrains predictions using real-time metrics, allowing responsiveness to dynamic changes.

Evaluation of Q-values for MO

While cooperating with the environment without previous information, RL may make choices on its own. This benefit makes it possible to apply RL to the issue of RA in a complicated cloud environment. The values of various management procedures are thus assessed in order to discover the most efficient RA for cloud-based services. This is done using the RL algorithm known as Q-learning. Various strategies for resource allocation have different values for the goal function. Both QoS and resource costs should be taken into account in the objective RA strategy (i.e., the best one). Administrators may acquire the objective RA plan referred to with the minimum value of the objective purpose based on the operational information of the existing environment as well as software services. Therefore, is sought for and recorded for each runtime environment. In particular, it also records pieces of operating data with various system states, where is the current workload, is the current RA idea, the QoS is indicated by , also the RA idea under the current environment objective is indicated by Eq. (1):

The RL can progressively identify the MO with the greatest Q-v under various system states by using past system data, with the Q-learning algorithm guiding the learning procedure of Q-v assessment. Since the RL goal is to maximize increasing rewards, the Markov decision process (MDP) is often used to simulate RL. A 4-tuple, represented by , may be utilized to more precisely describe an MDP. is the state space, is the state transition function, is the action space, and is the reward function. is a 2-tuple that represents the state space . As a result, stands for the runtime environment’s current system state, which includes the current workload as well as RA strategy. defines the action space, where an action is the addition or deletion of a VM of a certain kind. A VM of the th type is added with and removed with . The objective RA plan is found by the RL agent with the help of the reward function. This objective RA plan is defined as in Eq. (2)

Here, denotes the objective resource allocation plan, along with the action signifies a MO. If is discovered by performing an action in accordance with the current RA idea, , the RL agent receives a reward of 10. The RA agent will earn a reward of -1 if an unidentified RA plan (not in ) is developed by performing an action under . In other circumstances, no reward will result from any act. The RL agent selects an action during training by the -greedy algorithm¹³ at the state , after which it obtains the immediate reward , along with the state transition takes place. So, the Q-v, signified by , signifies the value got by selecting the action an at the state , and is reorganized by Eq. (3)

where represents the learning rate, signifies the discount factor, also is the maximum Q-v got by choosing the action at the next state. The Q-learning method is utilized to assess the Q-v of various management processes in order to achieve the goal of RA plans based on the aforementioned criteria. The indicator first initializes the Q-v table using a random Q-v. The Q-learning method is then utilized to assess the Q-v of management activities in each potential resource allocation plan, and training is continued until the Q-v converge. A random optional resource allocation scheme is used to initialize in each epoch of . The -greedy approach is used to choose an action from the action space A based on the current Q-v if is not . Following the calculation of the action’s reward using Eq. (2), the subsequent resource allocation plan (designated by ) is constructed. Then, using Eq. (3), the Q-v in are updated. The state transition then occurs once is changed to . Finally, until the is identified, will be continually updated. As a result, the Q-v database keeps track of the workloads, RA strategies, as well as associated Q-v for all management activity (adding or deleting VMs of various kinds) at various times.

Prediction model

Although the Q-v assessed by the Q-learning method may be used to decide on possible management procedures, the decision model for RA has to be reskilled when workloads alteration. Since the static workload used in the previous RL-based techniques is targeted for the cloud environment. As a result, they cannot be used in real-world cloud-based software service situations with fluctuating workloads as well as service demands. To solve this significant issue, a Q-v optimization system is created to forecast the management Q-v activities under many workloads as well as service request situations. As a result, the proposed Q-v optimization model may considerably increase the runtime environment’s resource allocation adaptability and effectiveness.

By choosing the necessary features and settings inside the WOA algorithm, the Feature Selection Whale Optimization Algorithm (FSWOA) is created. The following flowchart, which is depicted in Fig. 2, also illustrates the FSWOA optimization process²³. Utilizing feature selection in the optimization field helps the optimization algorithm explore the search space more vigorously and widely. Probability distributions are used to apply randomization in practically all meta-heuristic methods with stochastic components. Therefore, the Q-v of MO is reduced the closer they are to . The SVM, RT, and KNN algorithms are utilized to train the Q-v optimization system grounded on the optimized Q-v of MO. The system with the most accurate predictions is chosen. It is investigated if there is a connection among input X as well as output Y for the issue of Q-v forecast. The expression shows the Q-v of the associated management activities. Particularly, unlawful management practices are automatically stopped. More particularly, the SVM, RT, and KNN algorithms’ fundamental principles are explained. These models are selected due to their unique strengths. The SVM model offers generalization in non-linear environments. The RT model enables fast, interpretable decision trees. Also, the KNN model provides similarity-based prediction baselines. These models ensure low latency and are suitable for dynamic workload variations. The equation for SVM is equated in (4)

where is the input matrix as well as is the output matrix. This equation defines the initial Q-value prediction model input-output structure used for cloud resource mappings. The Gaussian kernel function is utilized to handle the issue of solving the parameters , which is further translated into a mapping problem of feature space as equated in (5)

Where and . This equation is used for projecting VM-related features into a high-dimensional space for accurate Q-value prediction.

Fig. 2 [Images not available. See PDF.]

FSWOA optimization process.

The equation for RT is equated in (6). First, the accuracy of datasets is determined by

where is the percentage of the kth category and stands for datasets that include categories. The next definition of the RT index function is defined as in (7)

where is a list of ‘s attributes. may be considered the ideal partition when the least value of is attained. RT evaluates the accuracy of clustering cloud actions for VM assignments. The equation for KNN is equated in (8). The group of is the outcome of the choice, which is the closest neighbor rule if is the nearest neighbor instance of for all sample instances. Let represent a sample from an unidentified group. The precise decision-making procedure is:

KNN determines the closest match of action sequences from historical datasets to predict Q-values under new conditions. The decision’s outcome is then acquired²⁴. The proposed Q-v optimization model will then be trained using the approach from the top three ML models that has the best prediction accuracy. The Q-v for various management activities under the runtime environment with variable workloads as well as service demands are predicted by the model to be accurate and flexible. Although FSWOA is a powerful optimizer, its computational complexity can increase training time. However, in this proposed framework, FSWOA is only used for pre-selection and is not part of the real-time Q-learning loop. This ensures the agent’s response speed remains suitable for online applications while maintaining optimization benefits.

Decisions made at runtime for management operations

The decision-creation process for selecting MO is carried out at runtime by using the projected Q-v of various management activities from the proposed Q-v optimization system. In Algorithm 1, the primary phases of this procedure are laid out. First, the related Q-v will be indexed if the management procedures are deemed unlawful. Otherwise, the Q-v optimization system will be employed to assess the Q-v of management activities. Next, the goal RA idea is thought to have been identified if the Q-v of every lawful MO are less than or equal to the predetermined threshold, at which point it is not essential to do any more management operations. If not, the minimal Q-v management activities will be carried out, and an objective RA plan will be continuously pursued. The proposed decision-making optimization FSWOA algorithm allows for the progressive discovery of the goal resource allocation strategy via response control in the runtime environment. The selection and execution of workable management activities occur at the end of each cycle. And the feedback control iterations will keep going until the decision-creation optimization algorithm’s output is null.

The time complexity of PCRA arises due to three components, which include (i) the prediction module using SVM/RT/KNN, (ii) Q-learning iterations, and (iii) FSWOA feature optimization. The prediction component runs in , where m is the number of samples and n is the feature space. The Q-learning update has a per-episode cost of , and FSWOA runs offline in , where T is iterations, N is the whale population, and D is the dimensionality. However, since FSWOA is used offline, the runtime load remains minimal during deployment.

Results

The experiment conditions and datasets are presented in this section. The studies were carried out using CloudStack²⁵, which enables the deployment of three distinct VM sizes (small, medium, and big). Table 2 displays the CloudStack settings for various VM types, including CPUs, memory, , and . Furthermore, , , and , respectively, indicate how many VMs of each kind there are. So, may be used to represent the current resource allocation strategy. This work operates the RUBiS standard²⁶, a model of the auction site, based on CloudStack. By simulating operator behavior for various assignment patterns, the RUBiS benchmark may be used to assess the presentation scalability of request servers. More precisely, the quantity of operators reflects the workload as a whole, and user behaviors include service requests for browsing as well as bidding (i.e., various tasks).

Table 2

CloudStack different VM types configurations.

Property	Different VM Types
Property	Large	Medium	Small
	2.084 RMB	0.471 RMB	2.084 RMB
	0.521RMB	1.885 RMB	0.521 RMB
CPU (core)	1	1	1
Memory (GB)	4	2	1

Table 3. Hyperparameters and settings.

Parameter	Value	Description
Learning rate (α)	0.01	Q-learning rate
Discount factor (γ)	0.9	Future reward importance
No. of episodes	500	RL training loops
FSWOA population size	30	Number of whales
FSWOA iterations	50	Optimization cycles
K in KNN	5	Neighbors for prediction
RT tree depth	6	Max tree depth

Table 3 summarizes the hyperparameters used in the proposed PCRA framework. The Q-learning module’s learning rate and discount factor chosen balance convergence speed and long-term decision impact. The feature selection phase using FSWOA adopts a moderate population size and iteration count to ensure stable feature optimization while keeping computational cost reasonable. SVM uses the RBF kernel for handling nonlinearities in workload patterns. While regression trees are capped at a depth of 6 to prevent overfitting. The KNN component uses k = 5, ensuring a better yet computationally feasible classification. These settings are selected based on initial tuning using grid search and existing works benchmarks.

Considerations when choosing a VM

To test how well the proposed scheduling method worked, this research used three different kinds of VMs. With the wide variety of workloads seen in large-scale cloud infrastructure, these VM types were selected to reflect that diversity while yet providing a realistic representation of VMs utilized in cloud environments.

Small is a small VM type that is ideal for resource-conservative, lightweight applications. For low-demand applications, testing, and development, cloud environments often employ it. The study’s inclusion of this VM type aids in simulating situations in which resource consumption is minimal yet scheduling is crucial.
Web applications, databases, and small-scale corporate systems are all examples of medium-level workloads that can be operated on medium VMs. A good proxy for common cloud use, the medium VM type strikes a good mix of processing power and resource needs.
For applications that need a lot of computational power, including data analytics, ML, or HPC, use a big VM like large. By including this type, this work evaluates the proposed scheduling strategy’s efficiency and scalability in more rigorous environments.

These three VMs allow for a thorough test of the proposed method with varying degrees of workload intensity. The findings will be applicable to a wide variety of cloud applications because of this selection, which mirrors how cloud providers distribute resources according to user demands.

Comparative study

First, the study evaluates, using various scenarios from Table 4, the performance of the proposed PCRA framework for adaptive RA. Typically utilized in cloud environments, the traditional scheduling approach was used for comparative reasons. VM allocation using round-robin scheduling is the standard approach employed in this research. This method uses a cyclical process to distribute virtual machines to hosts, which makes it easy and fair to use all of the available servers for different tasks. Although round-robin scheduling works well for allocating resources in general, it is not ideal for large-scale, dynamic environments since it doesn’t take into consideration things like workload characteristics, resource consumption, or VM-specific needs. Because of its simplicity and widespread use in many cloud environments, especially in smaller or less sophisticated configurations, this traditional technique is used as a baseline. By contrasting the proposed method with this conventional one, this work demonstrates the merits of advanced dynamic scheduling algorithms that take more variables into account.

More specifically, the comparison is between the ideal plans and the plans for allocating resources that were made using the PCRA framework in these ten different situations, where the perfect strategy may really meet criteria with the highest QoS as well as the lowest resource prices. Finding the optimal plans in practice, however, is impossible since it would require exhausting every possibility, which would result in an intolerably high level of complexity. For instance, there are three different kinds of VMs, and 729 resource allocation schemes are implemented for the given workload and allotted VMs. In the meantime, compare the goal function values for each of these 729 plans to determine which is best in the given situation. In order to create the best strategies for various situations, combine managerial expertise and local verification. The PCRA framework provide RA plans that are almost identical to the ideal ones, and the variance among them is negligible, as shown in Table 4. For instance, they are the same in Cases 2 and 4. In other instances, the gap between the proposed solutions and the optimal ones is rather small. The study also compares the effectiveness of resource allocation between the PCRA framework and the ideal plans in various scenarios. The findings demonstrate that the PCRA framework delivers optimum presentation (QoS as well as resource costs) in RA as well as being suited to handle the RA needs of cloud-based software applications with a range of workloads as well as service demands.

Table 4

Comparison table.

No	PCRA framework			Conventional method
No
1	4	1	0	4	3	0
2	4	1	0	4	1	0
3	5	3	0	5	2	0
4	4	4	0	1	4	0
5	6	2	0	5	3	0

After effective, optimal resource allocation is done, prediction is done. The proposed Q-v optimization-based prediction model is performance is first assessed using a variety of training techniques, including SVM, RT, and KNN. The definition of the action accuracy rate, which is used to assess how accurate management actions are in the decision-making process, is given as in (9)

where M is the total count of management actions executed and A is the total count of correct actions. Management operations are considered to be moving in the right direction if they help identify the goal resource allocation strategy. All three of these models, which use various techniques, can finish the training process in 2 to 3 s, as shown in Table 5. The results demonstrate their outstanding training effectiveness. To address the core performance of resource placement, placement improvement criteria such as task response time, SLA violations, and VM utilization are introduced. The PCRA framework, when compared with round-robin, reduces SLA violations by 22.3%, improves average VM utilization by 17.4%, and lowers the task waiting time by 14.1% across varying workloads. This confirms that higher prediction accuracy details better placement and QoS compliance. Additionally, among these three models, the SVM model gets the best Q-v accuracy, and it may surpass the others’ accuracy with a 3–6% performance gain. As a result, utilizing the SVM model for Q-v prediction throughout the decision-making process will allow for better management operations for resource allocation.

Table 5. Comparison of the effectiveness of several Q-v prediction models.

Models	Training period	Accuracy
SVM	2.34 s	94.7%
RT	1.99 s	92.5%
KNN	3.03 s	90.4%

Therefore, the PCRA framework always make judgments for management operations with good accuracy with the help of the Q-v prediction model when there are several stages to optimum RA plans. Only when nearing the optimum plans do judgments for management operations deviate somewhat. As a result, the plans for resource allocation that have been produced frequently may satisfy the needs of system management.

Table 6. Comparative analysis of PCRA with state-of-the-art resource allocation models based on SLA adherence, utilization, prediction accuracy, and adaptability.

Model	SLA violation rate	VM utilization	Task waiting time	Prediction accuracy	Adaptability to workload
PCRA (Proposed)	4.2%	91.3%	1.1s	94.7%	High (Real-time Q-learning)
Hybrid MCD [20]	6.1%	87.4%	1.7s	-	Medium
GWO [18]	7.5%	84.6%	2.3s	-	Low
MOSFLA + GA [19]	5.6%	88.5%	1.6s	-	Medium
Deep RL [8]	5.1%	89.0%	1.4s	92.1%	High

Table 6 shows the proposed PCRA framework is compared using different metrics related to cloud environments, which shows the proposed improved performance.

A lowest SLA violation rate of 4.2% is achieved by the PCRA, which indicates that it maintains service quality more reliably under varying workloads. This improvement is due to the usage of a real-time Q-learning agent, which adapts decisions dynamically by using changing conditions.
PCRA ensures that resources are neither over-provisioned nor underutilized, with 91.3% VM utilization, improving overall system efficiency. This efficiency results from the feedback-driven optimization by means of Q-value predictions and the FSWOA decision engine.
PCRA reduces task waiting time to 1.1 s, showing better results than other existing models. This is a direct result of the adaptive management operations, which prioritize decisions with the lowest projected Q-value, ensuring quick response to workload spikes.
Achieving 94.7% prediction accuracy, PCRA exceeds even improved models comprising Deep RL. This high accuracy is due to the combination of three different ML models, such as SVM, RT, and KNN, which select the best performer for each scenario.
PCRA supports real-time adaptability through continuous monitoring and Q-learning updates, unlike those relying on batch optimization or static models. Its feedback control structure enables it to respond faster and more precisely to fluctuating demands.

Fig. 3 [Images not available. See PDF.]

Q-learning convergence (cumulative reward).

Figure 3 shows the convergence behavior of the Q-learning agent by a plot with the cumulative reward across episodes. As the number of episodes increases, the cumulative reward steadily grows and stabilizes, confirming that the agent successfully learns an optimal resource allocation policy. This confirms the claim that the reinforcement learning component in PCRA is able to adapt over time to dynamic cloud conditions.

Fig. 4 [Images not available. See PDF.]

PCRA vs. baselines.

Figure 4 presents a comparative analysis of the proposed PCRA framework with three models, including Hybrid MCD, GWO, and Deep RL. The metrics measured are VM utilization, SLA adherence (100 - violation rate), and cost efficiency (100 - cost). It is shown that the PCRA framework works better than all three models with each metric, achieving the highest VM utilization (91.3%) and the lowest SLA violation rate (4.2%). The ability of the system to optimize resource use and maintain quality of service is demonstrated by these results.

Fig. 5 [Images not available. See PDF.]

Workload prediction accuracy.

Figure 5 compares the predicted workload values from the PCRA’s forecasting model with actual workload patterns over time. The close position between the two curves in the figure reflects the accuracy of the prediction module, such as SVM, RT or KNN, which feeds into the Q-value estimation. By this, the system’s ability to anticipate resource demand is validated in advance, which is important for proactive scheduling.

Fig. 6 [Images not available. See PDF.]

Adaptive response time under load.

Figure 6 depicts the dynamic response behavior of the system under increasing workload conditions. Initially the average response time rises with demand but then stabilizes, and this demonstrates how the PCRA framework adapts to load changes by provisioning resources accordingly. This reflects the PCRA framework’s effectiveness in maintaining system responsiveness even during workload spikes.

The use of predictive models, combined with a Q-learning loop and offline optimization, enables PCRA to adapt to live workload changes. This end-to-end integration is absent in most existing literature models, which lack runtime decision adaptation or rely only on static ML predictions. Most existing models focus on static or semi-dynamic scenarios. PCRA, however, is designed for fully dynamic cloud environments, with the ability to learn and improve during runtime. The integration of predictive learning with reinforcement-based decision-making allows PCRA to make context-aware, low-latency choices that are both resource-efficient and QoS-compliant. Furthermore, FSWOA is only used offline, avoiding any delay during real-time execution—thus combining optimization power with practical speed.

Statistical significance and ablation study

To validate the significance of the proposed PCRA’s framework performance, this research applied the Friedman test across key metrics. This metric includes SLA, utilization, and waiting time, comparing PCRA against three baselines. The obtained results, i.e., p < 0.05, confirmed statistically significant improvements.

Additionally, an ablation study is conducted by removing three components. Initially FSWOA is removed, which results in a 7% drop in accuracy, and Q-learning is removed, which leads to unstable policy behavior. Finally, the predictive layer is removed, which increases SLA violations. These observations support the necessity of each component in the PCRA framework.

Conclusion and future work

In order to investigate adaptive as well as effective RA for cloud software applications with good QoS as well as little resource prices, a PCRA framework is proposed in this study. Primarily, a Q-v prediction model is created using the Q-learning algorithm and several ML-based learners to forecast the values of management processes. The development of an FSWOA-based decision-making optimization algorithm for locating impartial resource allocation schemes follows. While FSWOA has higher complexity, it is used during the model training phase and not within the continuous runtime execution, ensuring that time complexity does not affect the real-time deployment. Simulated results show that the proposed PCRA framework is useful for getting the best performance (in terms of QoS as well as resource prices) when assigning resources based on the RUBiS benchmark. More precisely, the PCRA framework has a 94.7% accuracy rate when it comes to managing RA. The PCRA framework not only works well in prediction accuracy but also in achieving improved placement, QoS metrics, and overall system efficiency in cloud applications. Furthermore, in a runtime environment with a variety of workloads as well as service requests, the PCRA framework beats the traditional ML-based as well as rule-based methods by 58% and 1014%, respectively. The PCRA framework dynamically allocates resources based on real-time workload conditions. It is possible to estimate workloads in cloud systems for more effective RA. As a result, the future study will examine an adaptive RA technique that takes into account both the present workload as well as changing workloads.

Author contributions

S. Kayalvili - conceptualization, data collection, methodology, data analysis, manuscript writingR. Senthilkumar, Yasotha S, and R.S. Kamalakannan - manuscript editing, supervision, visualization. All authors reviewed the manuscript.

Data availability

The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.

Declarations

Competing interests

The authors declare no competing interests.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1. Parfenov, D., Bolodurina, I., Kuznetsova, L., Zabrodina, L. & Yanishevskaya, N. Application of bioinspired methods for solving the problem of resource allocation in cloud platforms. In 2021 International Conference on Information Technology and Nanotechnology (ITNT) 1–7 (IEEE, 2021).

2. Senthilkumar, R; Geetha, BG. Signature verification and bloom hashing technique for efficient cloud data storage. Wireless Pers. Commun.; 2018; 103, pp. 3079-3097. [DOI: https://dx.doi.org/10.1007/s11277-018-5995-8]

3. Xu, Y; Mohammed, AH. An energy-aware resource management method in cloud‐based internet of things using a multi‐objective algorithm and crowding distance. Trans. Emerg. Telecommunications Technol.; 2023; 34, 1 e4673. [DOI: https://dx.doi.org/10.1002/ett.4673]

4. Karpagam, M. Hybrid RSO algorithm with SFLA for scientific workflow scheduling in cloud using clustering techniques (2021).

5. Senthilkumar, R; Geetha, BG. Asymmetric key Blum-Goldwasser cryptography for cloud services communication security. J. Internet Technol.; 2020; 21, 4 pp. 929-939.

6. Senthilkumar, R; Gokulraj, D; Kamalakannan, RS; Narayanan, K. Pearson hashing B-tree with self-adaptive random key elgamal cryptography for secured data storage and communication in cloud. Webology; 2021; 18, 5 pp. 4481-4497.

7. Saidi, K. & Bardou, D. Task scheduling and VM placement to resource allocation in cloud computing: challenges and opportunities. Cluster Comput. 1–19. https://doi.org/10.1007/s10586-023-04088-2 (2023).

8. Cui, T; Yang, R; Fang, C; Yu, S. Deep reinforcement learning-based resource allocation for content distribution in IoT-edge-cloud computing environments. Symmetry; 2023; 15, 1 217.2023Symm..15.217C [DOI: https://dx.doi.org/10.3390/sym15010217]

9. Gupta, P et al. Hybrid Whale optimization algorithm for resource optimization in cloud e-healthcare applications. Computers Mater. Continua; 2022; 71, 3 pp. 5381-5396. [DOI: https://dx.doi.org/10.32604/cmc.2022.020992]

10. Wu, P; Yang, Q; Chen, W; Mao, B; Yu, H. An improved genetic-shuffled frog-leaping algorithm for permutation flowshop scheduling. Complexity; 2020; 2020, pp. 1-15.2020Cmplx2020..1Q1:CAS:528:DC%2BB3cXisF2mtb3N [DOI: https://dx.doi.org/10.1155/2020/3070931]

11. Kayalvili, S; Selvam, M. Hybrid SFLA-GA algorithm for an optimal resource allocation in cloud. Cluster Comput.; 2019; 22, Suppl 2 pp. 3165-3173. [DOI: https://dx.doi.org/10.1007/s10586-018-2011-8]

12. Riahi, A; Faris, H. Reinforcement learning-based resource allocation in dynamic cloud environments: A multi-agent perspective. J. Cloud Comput.; 2023; 12, 1 24. [DOI: https://dx.doi.org/10.1186/s13677-023-00364-4]

13. Sutton, R. S. & Barto, A. G. Reinforcement Learning: an Introduction(MIT Press, 2018).

14. Asghari, P., Rahmani, A. M., Haj, S. & Javadi, H. Privacy-aware cloud service composition based on QoS optimization in internet of things. J. Ambient Intell. Humaniz. Comput. 1–26. https://doi.org/10.1007/s12652-020-01775-5 (2020).

15. Amini Motlagh, A; Movaghar, A; Rahmani, AM. A new reliability-based task scheduling algorithm in cloud computing. Int. J. Commun Syst; 2022; 35, 3 e5022. [DOI: https://dx.doi.org/10.1002/dac.5022]

16. Gola, K. K., Singh, B. M., Gupta, B., Chaurasia, N. & Arya, S. Multi-objective hybrid capuchin search with genetic algorithm based hierarchical resource allocation scheme with clustering model in cloud computing environment. Concurrency Comput. Pract. Experience35(7), e7606 (2023).

17. Durgadevi, P; Srinivasan, S. Resource allocation in cloud computing using SFLA and cuckoo search hybridization. Int. J. Parallel Prog.; 2020; 48, pp. 549-565. [DOI: https://dx.doi.org/10.1007/s10766-018-0590-x]

18. Pani, AK; Dixit, B; Patidar, K. Resource allocation using Democratic grey Wolf optimization in cloud computing environment. Int. J. Intell. Eng. Syst.; 2019; 12, 4 pp. 358-366.

19. Xu, Y; Sun, Z; Xue, X; Gu, W; Peng, B. A hybrid algorithm based on MOSFLA and GA for multi-UAVs plant protection task assignment and sequencing optimization. Appl. Soft Comput.; 2020; 96, 106623. [DOI: https://dx.doi.org/10.1016/j.asoc.2020.106623]

20. Khaleel, MI. Hybrid cloud-fog computing workflow application placement: joint consideration of reliability and time credibility. Multimedia Tools Appl.; 2023; 82, 12 pp. 18185-18216. [DOI: https://dx.doi.org/10.1007/s11042-022-13923-8]

21. Jelodari, N; Pourhaji Kazem, AA. Black widow optimization (BWO) algorithm in cloud brokering systems for connected internet of things. J. Comput. Rob.; 2022; 15, 1 pp. 33-45.

22. Chen, X et al. Resource allocation for cloud-based software services using prediction-enabled feedback control with reinforcement learning. IEEE Trans. Cloud Comput.; 2020; 10, 2 pp. 1117-1129. [DOI: https://dx.doi.org/10.1109/TCC.2020.2992537]

23. Gharehchopogh, FS; Gholizadeh, H. A comprehensive survey: Whale optimization algorithm and its applications. Swarm Evol. Comput.; 2019; 48, pp. 1-24. [DOI: https://dx.doi.org/10.1016/j.swevo.2019.03.004]

24. Xing, W; Bei, Y. Medical health big data classification based on KNN classification algorithm. IEEE Access.; 2019; 8, pp. 28808-28819. [DOI: https://dx.doi.org/10.1109/ACCESS.2019.2955754]

25. Sabharwal, N. Apache CloudStack Cloud Computing (Packt Publishing, 2013).

26. RUBiS: Rice University Bidding System Benchmark. (2019). http://rubis.ow2.org/

Word count: 6867

Show less

© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

An optimized resource allocation in cloud using prediction enabled reinforcement learning

Content area

Abstract

Full text