Content area
Intelligent scheduling and resource allocation of user equipments (UEs) in wireless networks has been an ongoing topic of research. The innovation in this field focuses mostly on generalizing the system to include more components, as well as deriving new ways to solve the problem. We address in this paper an unexplored case of the scheduling-offloading problem for a wireless network with mobile edge computing (MEC). In this network, the UEs have mobility models and are transmitting using non-orthogonal multiple access (NOMA). They are also equipped with data buffers and batteries with energy harvesting (EH) capabilities. We propose a novel UEs clustering approach to account for the growing NOMA inter-user interference, which can lead to performance issues especially in the downlink decoding phase. In addition, clustering can help reduce the problem complexity by distributing it among the clusters that operate independently. We investigate deep reinforcement learning (DRL) to devise efficient policies that minimize the packet loss due to delay infringements. Moreover, we use federated learning (FL) to learn a unified policy accounting for the dynamic nature of clusters. Our simulation results based on DRL method, namely the proximal policy optimization (PPO), and standard methods, show the effectiveness of using learning-based algorithms in terms of minimizing the packet loss and the energy consumption.
Introduction
Modern wireless networks face the growing challenge of optimizing limited resources, such as bandwidth, energy, and channel access, across an increasingly large and diverse set of mobile user equipments (UEs) or Internet of Things (IoT) devices. As the number of UEs and devices continues to rise, each with distinct service needs and mobility patterns, networks must balance the dual objectives of maintaining fairness and minimizing interference between users. This complexity is further amplified by emerging technologies, such as mobile edge computing (MEC), energy harvesting (EH), and non-orthogonal multiple access (NOMA), which introduce new dimensions to resource handling by adding computational offloading, sustainable energy use, and more efficient spectrum utilization.
MEC, for instance, allows UEs with limited processing power and small battery capacities to offload computationally demanding or time-sensitive tasks to servers located near the Base Station (BS). This reduces local processing delays and improves overall network efficiency, enabling UEs to handle more complex tasks without exhausting their resources. On the energy front, EH technologies offer a sustainable solution by harvesting ambient energy to recharge UEs’ batteries. This not only extends the operational time of UEs and IoT devices but also reduces dependence on non-renewable energy, minimizing the environmental impact of Information and Communication Technologies (ICT).
In terms of spectrum utilization, NOMA [1] emerges as a promising technique for enhancing multi-user communications by allowing simultaneous access for multiple UEs in both time and bandwidth. Unlike traditional Orthogonal Multiple Access (OMA) methods, NOMA increases spectral efficiency by overlapping users on the same frequency resources, while using Successive Interference Cancellation (SIC) decoding to mitigate the resulting inter-user interference. Together, the aforementioned technologies present both opportunities and challenges in designing scalable algorithms that can handle the dynamic conditions of next-generation wireless networks.
Leveraging on such technological advancement, this paper tackles a joint scheduling and offloading problem in a NOMA system with multiple UEs served by a base station. The BS is equipped with a nearby MEC server and the UEs have EH capabilities. We consider that the UEs have tasks to be executed under strict delay constraints, either locally or remotely via the MEC server. Every UE is fitted with a limited-size battery, which can be recharged via energy harvesting. The objective of this work is to devise scheduling-offloading policies that execute efficiently the tasks of the UEs by minimizing the packet loss due to the strict delay and buffer overflow, given the available information at each UE on the buffer, channel, and battery states.
In order to resolve this problem, we aim to use Reinforcement Learning (RL) techniques that can find relevant policies by optimizing the UEs’ decisions at any given circumstance. In fact, RL can rely on a Markov Decision Process (MDP) modeling of the environment to learn its dynamics and produce optimal policies. However, when the environment becomes bigger and modeling it becomes more difficult, Deep Reinforcement Learning (DRL) with the aid of Neural Networks (NNs) can be used to overcome such dimensionality and scalability issues, and learn to make the best decisions.
In our previous works [2, 3], the DRL methods, especially the Proximal Policy Optimization (PPO) method, [4] showed great performance in the studied scenarios with two UEs communicating in NOMA and considering centralized decisions. However, when generalizing to larger number of connected UEs in the system, two issues can arise. The first issue is encountered when using DRL algorithms. As the system grows larger, the performance of these algorithms heavily depend on the environment exploration time. The second issue is related to NOMA performance due to the increasing interference between users. Indeed, the performance can be degraded especially in the downlink phase with SIC decoding of the UEs, causing hence additional delay issues.
An alternative to overcome these issues is to group the UEs into different clusters, where each cluster has a smaller number of UEs and can act as its own decision-maker. On one hand, clustering the UEs allows to effectively reduce the complexity of the optimization problem by distributing the learning among multiple agents (i.e., multi-agent RL (MARL)), improving thus the performance of DRL algorithms. On the other hand, each cluster can communicate on a dedicated bandwidth in the downlink, which can help reduce the inter-user interference. Therefore, we propose a clustering process to optimize NOMA communications by grouping the dynamic UEs with different SNR levels. This clustering approach aims to preserve location proximity between the UEs while considering their mobility around the BS.
To achieve a unified scheduling-offloading policy across different clusters, we adopt federated learning (FL) [5] for the distributed training of the DRL model. This leads to the development of a Federated Reinforcement Learning (FedRL) framework, where each cluster, represented by its local decision-making node, shares its learned models with a central global node at the MEC server. The global node aggregates these locally trained models by updating its own model with the shared weights and then broadcasts the updated model back to the clusters. This process continues iteratively until the model converges. We apply the proposed framework using the proximal policy optimization PPO algorithm, yielding a federated version of PPO.
The main contributions of our work are summarized in what follows:
We propose a joint scheduling and offloading framework in a multi-cluster network architecture, integrating multiple UEs served by a base station with a MEC server. The devised policy is able to minimize the packet loss while taking into account energy harvesting capabilities at the UEs, dynamic channel conditions, multi-user communications with NOMA, and strict delay constraints.
To enhance the NOMA performance and manage the overall system complexity, we introduce a priority-based clustering method that optimizes UE proximity within a polar coordinate basis. This approach accounts for the UEs’ mobility and their different application requirements.
We apply the proximal policy optimization method to develop efficient decision-making policies across clusters. Additionally, we integrate a federated learning approach to collaboratively learn a unified policy, reducing thus deep reinforcement learning complexity and improving system performance.
Related works
Federated reinforcement learning
Regarding the existing works on resource allocation problems with FedRL, the work [6] presented a concurrent federated reinforcement learning scheme for resource allocation, where the main goal was to preserve the privacy of the edge hosts and the server. The distributed agents decide in a concurrent way while sharing their outputs and rewards, and not their models. The authors in [7] proposed an intelligent ultra-dense edge computing framework for AI and Blockchain applications. It jointly optimized resource allocation, application partitioning and server caching. The edge user can offload its tasks either to an edge server, a mobile device in proximity, or to a cloud server. To train the model, FL is used with DRL, where each agent’s model, located at the edge device level, is trained locally, and the weights are sent into the macro base station where they will be averaged and a new global model is broadcasted, in a similar way to our scenario. The paper [8] tackled the problem of vehicle-to-vehicle communications. It investigated the reuse of cellular channels that were already allocated to perform the communication, while not interrupting the existing cellular operations and avoiding collisions with other vehicle-to-vehicle links. Federated multi-agent DRL was used to get policies that determine the transmission powers and the cellular channel. The double DQN with dual architecture was considered and the FL scheme consisted of sharing the local weights with the central node, which broadcasts then the global averaged model to the agents. The work in [9] addressed challenges in ensuring high-quality healthcare in the 6 G era through the integration of wearable medical devices into the Internet of Medical Things (IoMT). Leveraging wireless body area network and MEC technologies, the study focused on optimizing the quality of service with ultra-reliable data transfer and processing at low latency and energy usage. The proposed FedeRL task offloading approach relied on the sharing of local weights with the central node to perform the averaging operation. Consul et al. [10] proposed a secured task offloading and resource allocation approach for digital twin-empowered UAV-assisted MEC systems using federated reinforcement learning. To enhance IoT device cooperation, digital twin models were employed to characterize system dynamics and improve decision-making. The system was modeled as a Markov decision process by jointly considering task completion and smart contract processing. Their FRL framework introduced a feature-based system decomposition to address convergence issues in symmetric network environments. Chen et al. [11] addressed resource conflicts in UAV-enabled IoT networks by proposing a conflict hypergraph-based federated reinforcement learning framework, where resource allocation is modeled as a node coloring problem to reduce complexity. They also developed a TP-DDPG algorithm for hierarchical FL with energy harvesting clients, jointly optimizing energy management, resource allocation, and client scheduling. Another study in [12] addressed the challenge of resource conflicts in UAV-enabled IoT networks with dense deployments by proposing a conflict hypergraph-based federated reinforcement learning framework. They first modeled the conflict relationships among IoT devices using graph theory and transformed the resulting conflict graph into a conflict hypergraph to reduce computational complexity. The resource allocation problem was then reformulated as a node coloring task on this hypergraph. Finally, they developed a collaborative FRL framework combining a global network with multiple D3QN-based local networks.
Notably, the mentioned papers perform a multi-agent RL setup. However, our contribution differs from them in the dynamic aspect of the nodes. In our case, the clusters can have varying numbers of UEs; and thus, the local model has to accommodate for that change by having a generic representation of the state.
NOMA clustering
Considering the NOMA clustering problem, the work in [13] proposed the optimal cluster as a result of a joint optimization problem for global throughput maximization, and operation constraints for SIC decoding under power and rate constraints. The derived solution was compared with the channel gain-based method, and downlink NOMA performance was assessed with regard to the number of users per cluster. Jingjing et al. [14] used K-means clustering method to cluster NOMA user in millimeter-wave systems. The clustering was done based on the spatial locations of the users and could accommodate dynamic changes in the number of the clustered users. The authors in [15] proposed a system setup that distinguishes between massive machine type communications (mMTC) and ultra-reliable low latency communications (URLLC) devices. A NOMA clustering method was developed where mMTC and URLLC devices were clustered together to avoid grouping multiple mMTC devices together. The clustering process also considered intra-cluster interference, transmission power and quality-of-service requirements. The work in [16] designed an adaptive strategy for clustering enhanced mobile broadband and IoT users aiming to maximize the use of NOMA and to optimize the used power. In the proposed scenario, NOMA is employed selectively by comparing its performance to OMA. The introduced clustering and power optimization algorithm proved more efficient than non-adaptive multiple access techniques and other clustering solutions. The survey in [17] provides a comprehensive review of intelligent user clustering techniques for NOMA. It covers both machine learning and non-machine learning approaches, including K-means, spectral clustering, reinforcement learning, and game theory-based methods. These clustering techniques are analyzed for their effectiveness in scenarios like IoT, UAVs, and RIS-enabled networks, emphasizing the trade-off between computational complexity and network performance. In [18], a clustered federated learning (CFL) framework empowered by NOMA is proposed to handle non-IID data distributions in wireless networks. Spectral clustering is employed to group users based on Dirichlet-distributed data profiles. The framework jointly optimizes sub-channel allocation and power control using a matching-based algorithm and KKT conditions, demonstrating improvements in test accuracy, convergence, and energy efficiency over baseline approaches.
Analyzing the listed clustering methods, we can notice that there is no solution that necessitates combining spatial and channel gain-based clustering method. Therefore, a clustering method that considers both can maximize the performance of NOMA in the network, while considering the spatial proximity of the users.
In addition to clustering techniques, recent works have explored the enhancement of NOMA performance through integration with emerging technologies. For instance, the study in [19] investigated multi-user downlink NOMA systems aided by ambient backscatter devices. The authors derived the achievable rate regions and proposed an energy-efficient resource allocation framework, demonstrating that ambient backscatter-aided NOMA significantly outperforms conventional NOMA and OMA, especially under imperfect channel state information. Furthermore, in the context of task offloading, Liu and Davidson [20] proposed a comprehensive resource allocation framework for binary computation offloading in K-user NOMA systems. By leveraging multiple-time-slot signaling and optimizing transmission parameters under various multiple access schemes, their approach efficiently reduces energy consumption while respecting latency constraints. In another direction, Taneja et al. [21] integrated intelligent reflecting surfaces (IRS) with NOMA for mobile edge computing in 6 G IIoT networks. Their resource control algorithm jointly optimized user association, IRS phase shifts, and clustering, resulting in significant improvements in system outage probability and achievable rates compared to conventional NOMA systems.
System model
Table 1. List of abbreviations
Abbreviation | Description |
|---|---|
UE | User equipment |
MEC | Mobile edge computing |
NOMA | Non-orthogonal multiple access |
EH | Energy harvesting |
DRL | Deep reinforcement learning |
FL | Federated learning |
PPO | Proximal policy optimization |
IoT | Internet of things |
BS | Base station |
ICT | Information and communication technologies |
OMA | Orthogonal multiple access |
SIC | Successive interference cancellation |
MDP | Markov decision process |
NN | Neural networks |
MARL | Multi-agent reinforcement learning |
FedRL | Federated reinforcement learning |
CH | Cluster head |
FDMA | Frequency division multiple access |
SNR | Signal-to-noise ratio |
Table 2. List of variables
Variable | Description |
|---|---|
Total number of UEs in the system | |
UE in the system | |
Number of clusters | |
cluster | |
Highest allowed number of clusters | |
Lowest allowed number of clusters | |
Number of UEs in cluster | |
UE in | |
Highest allowed number of UEs in a cluster | |
Lowest allowed number of UEs in a cluster | |
e | Transmission episode index |
Number of timesteps in an episode | |
t | Timestep index |
Timestep duration (in milliseconds) | |
The average of the statistical channel gain of all the UEs in in cluster | |
Complex channel amplitude for | |
Channel gain for | |
Total range of channel gain values | |
Total range of quantized channel gain values | |
the subinterval of the channel gain range | |
the quantized subinterval of the channel gain range | |
The number of discrete values in the subinterval | |
Number of quantized subintervals | |
lower bound for | |
upper bound for | |
quantized channel gain for (lower bound) | |
quantized channel gain for (upper bound) | |
Uplink bandwidth | |
Downlink bandwidth | |
Downlink bandwidth allocated for cluster | |
Offloading flag for | |
Uplink rate for | |
Uplink channel capacity for | |
Downlink rate for | |
Intra-cluster interference for (uplink) | |
Intra-cluster interference for (downlink) | |
Inter-cluster interference expression for (uplink) | |
Inter-cluster interference estimation for (uplink) | |
Offloading power for | |
Maximum offloading power | |
AWGN spectral density | |
Total power of BS broadcast | |
Power allocation coefficient for with NOMA in downlink | |
Data buffer vector | |
Data buffer vector size | |
Number of packet that arrive in the data buffer at timestep t for | |
Number of packets in the data buffer at timestep t for | |
Maximum strict delay for the packets | |
Number of packets dropped due to delay violation | |
Number of packets dropped due to buffer overflow | |
Number of packets lost due to a transmission error |
Table 3. List of variables (continued)
Variable | Description |
|---|---|
Battery capacity | |
Energy unit size (Joules) | |
Number of energy units harvested and stored in the battery at timestep t for | |
Total number of energy units in the battery at timestep t for | |
Number of packets to execute at timestep t for | |
Maximum number of packets to execute locally | |
Maximum number of packets to offload | |
Size of the packet in uplink (bytes) | |
Size of the packet in downlink (bytes) | |
Energy consumed by (either in idle, local, or offload) | |
waiting time | |
Cluster head of | |
State space representation for cluster | |
Data buffers of the UEs in | |
Quantized channel gains of the UEs in | |
Battery sizes of the UEs in | |
Average channel gains of the UEs in all the clusters other than | |
Reward function for cluster following policy | |
Optimal policy for cluster | |
FedRL agent weights for cluster | |
Global policy for cluster | |
Global agent weights | |
Angle range | |
Quantized angle range | |
Discrete angle for for episode e | |
Actor network weights | |
Critic network weights | |
Policy estimation state s | |
Value function estimation state s | |
PPO objective function | |
Augmented PPO objective function | |
State representation for PPO |
[See PDF for image]
Fig. 1
Multi-cluster system model with one cluster head CH for each cluster
We consider a system model as illustrated in Fig. 1, consisting of a BS, with a nearby MEC server, available to active UEs with EH capabilities, limited-size batteries and data buffers. We separate the UEs into distinct clusters to facilitate the communication and reduce the complexity in the decision-making. We set the maximum number of clusters to and the number of UEs per cluster to . Before the start of each episode e of transmission with timesteps, we perform a clustering process and the UEs are distributed among the clusters based on their location, i.e., their channel gains and angles w.r.t. the BS. Figure 2 highlights the general system evolution through time. Abbreviations are listed in Table 1, while the most relevant and used variables are listed in Tables 2 and 3.
[See PDF for image]
Fig. 2
A general view on the system evolution
We assume that each cluster has a cluster head (CH), randomly selected among the UEs in the cluster. At the start of each timestep t of size , the buffer, channel, and battery information of all the UEs in the cluster are shared with the CH. This information enables it to decide on the action to take by each UE. The decision is then broadcasted to these UEs for free.
Furthermore, at the start of the episode, the average of the statistical channel gain of all the within a cluster (denoted as ) is transmitted to the other clusters via the BS. After that, no information is shared between the clusters for the remainder of the episode.
NOMA communications are enabled in both the uplink and the downlink, with some key differences. In the uplink phase, we assume that all the UEs from all clusters can send simultaneously their data packets to the BS using the whole available bandwidth, and the SIC operation is performed globally on all UEs. Obviously, inter-cluster and intra-cluster interferences are present and have to be taken into consideration. At the CH level, the inter-cluster interference cannot be computed precisely since the only available information about the other clusters’ UEs is their average channel gain. Therefore, we perform an estimation to account for this interference, and underestimating it leads to a transmission error (i.e., the transmission rate will be higher than the Shannon limit). On the counterpart, downlink operations are done in NOMA for each cluster separately. This means that we apply FDMA on a cluster-basis to eliminate the inter-cluster interference and enhance the performance of SIC decoding. The differences in uplink and downlink scenarios stem from the computational advantage of the BS compared to the CH.
Finally, each UE can either process the data arrived at its buffer locally, or remotely by offloading the packets to the MEC server. The taken decision considers the packets delays, channel gains, and battery levels. In what follows, we detail each of the channel model, transmission model, data buffer model, and battery model. After that, we describe the scheduling decisions, as well as the corresponding energy equations and time constraints.
Channel model
We assume that the channel between the UEs and the BS follows a block flat-fading channel modeling with AWGN ( being the noise spectral density). For a cluster , each with has a complex channel amplitude , and a gain . We assume that the Channel State Information at the Transmitter (CSIT) is available at the CH level and at the BS level. Finally the channel response is constant for the duration of a timestep t.
We further divide the channel gain range into multiple subintervals that refer to categorical channel gain conditions (e.g., bad SNR, mid SNR, good SNR). Each subinterval is bounded by a maximum and a minimum value, and each UE’s channel gain will vary within this subinterval for the duration of an episode before transitioning to a different channel gain subinterval in the next episode (Fig. 2).
Formally, we define as the total range of channel gain values, which can take theoretically all the values in , and is the subinterval of the channel gain, with and being its lower and upper bounds, respectively. Channel gain variations in time within the subinterval are modeled as an i.i.d. random process following a truncated Rayleigh distribution with variance , shifted by :
1
with the normalization factor to ensure an unit-integral. We quantize the values of the channel gain to make the space finite. For a specific , a quantization function is defined that will project the UE’s channel gain into a discrete space with finite values . The produced quantized channel gain is the lower bound in the interval where is contained, i.e., where is the upper bound of the considered interval. We use of the lower bound value to allow the transmission in a worse case scenario even if the used value is different (contrary to the upper bound value).The resulted quantized global range of channel gain values is and the quantized subinterval is denoted as , with indicating the number of discrete values in the subinterval, and indicating the number of subintervals. We also bound the range between two values denoted as and . Furthermore, we model the transition from one subinterval to another at each episode as a correlated process to simulate a large location change. Therefore, the probability of transitioning from quantized subinterval to when starting a new episode for is modeled as follows:
2
where is the channel range correlation factor.Transmission model
For NOMA transmission in the uplink, we assume that all UEs can transmit at the same time using the entire available bandwidth . The BS can decode all signals using SIC since it has full knowledge of channel conditions of all UEs. Therefore, intra-cluster as well an inter-cluster interference terms are present in the rate expressions in this case. However, in the downlink case, we use FDMA between clusters, where a bandwidth is allocated for a cluster . NOMA is then performed on a cluster level, eliminating the inter-cluster interference.
Uplink transmission
We assume that which belongs to cluster with and is transmitting to the BS, i.e., offloading packets, ( indicates that the is not offloading). We further suppose, for simplicity of computations, that the UEs in the cluster are ordered from best channel gain to worst , i.e., if . The received signal is expressed as:
3
where is the offloaded data signal transmitted from to the BS, and depending on whether or not offloads. is the AWGN noise. Upon receiving , the BS proceeds into decoding the individual signals using the SIC scheme. The process begins by decoding the strongest signal, i.e., the one with the best channel,1 while considering the other signals as interference. After that, the decoded signal is subtracted from the and the process starts again. From that, the uplink rate for is:4
with5
6
being the intra-cluster interference and the inter-cluster interference power estimation, respectively. is the transmission (offloading) power of , which is limited to a maximum value . is the average of the statistical channel gain for all the UEs in cluster .Intuitively, the intra-cluster interference term considers all the signals from the offloading UEs with a lower channel gain than . This term is accurately computed with the assumption that the channel gains and decisions are known within the cluster .
The inter-cluster interference, on the other hand, is an estimation of its power based on the information available at the cluster. It considers the maximum offloading power with the average of the statistical channel gain of all clusters with , if channel gain is higher than . In fact, we assume the SIC decoding order in uplink starts with the strongest signals and goes to the lowest, and that all UEs offload ().
The true capacity of the channel can only be known at the BS by estimating correctly the inter-cluster interference. Given the offloading powers of each UE, it is computed as follows:
7
with8
Therefore, if the uplink rate used by to transmit its data is higher than the channel capacity , a transmission error would occur. More specifically, if the estimated interference power is less than the exact one, i.e., , a rate mismatch would terminate the communication.Downlink transmission
Similar to the uplink case, we assume that the ordering of the offloading UEs in cluster is based on their channel gains. Given that the cluster are separated from each other in bandwidth (FDMA), the received signal at broadcasted from the BS is:
9
where is the signal response of the BS to , if it has already offloaded its data, i.e., . is the power of the BS broadcast, and is the allocation coefficient of the broadcast power used to perform power control. In fact, to ensure efficient decoding in the downlink, The BS allocates more power to UEs with weaker channel gains [22]. The allocation coefficient for is:10
SIC decoding will then start decoding the signals with more allocated power (i.e., worse channel gains), subtract them from the received signal, and then redo the operation for the next signal. Therefore, the downlink rate expression for is as follows:11
where is the downlink intra-cluster interference coming from the other UEs that offload and have stronger channel gains than :12
and13
is the cluster allocated bandwidth with FDMA. Note that with more UEs in the cluster, more bandwidth is allocated to it.Data buffer model
We equip each with a limited-size buffer storing the incoming tasks, which have to be executed within a strict delay. We model the buffer structure with a vector of size . We assume that at each timestep t, number of packets arrive in the buffer, set to an age of 0, whereas empty slots are set to . We denote as the number of packets present in the buffer. These packets keep aging with each timestep until reaching a maximum delay . In this case, they are dropped due to delay violation (). Moreover, when the buffer has an insufficient number of empty slots and a new batch of data packets arrives that exceeds the number of empty buffer slots, i.e., , the packets are dropped due to buffer overflow (). Furthermore, we assume that the packets lost due to the transmission error () are retained in the buffer for the next timestep to be reprocessed.2
In general, several realistic applications could be running on the UEs. Thus, to introduce some heterogeneity of the UEs data, we model different data arrivals for each . In particular, we use the data arrival models associated with video streaming, gaming, and other applications that are described using statistical formulations in [23]. We detail the distributions used in the sequel.
Poisson Distribution Suited for IoT applications, Poisson random distribution is widely used in the literature. The arrival of packets with mean follows:
14
Uniform Distribution According to the survey in [23], the uniform distribution models the packet arrival for gaming applications, as adopted in 3GPP and IEEE traffic models. The arrival of packets is modeled with equal probability for every value between and :15
Lognormal Distribution Truncated lognormal distribution is used for modeling FTP traffic and web-browsing applications in 3GPP and IEEE traffic standards. The arrival of packets is modeled with mean (we consider the variance = 1) as follows:16
The distribution is naturally suited for continuous values, which is not the case in our system as we consider the arrival of discrete number of packets. Therefore, we round the results produced by the distribution to the nearest discrete value.Pareto Distribution For video streaming applications, the standards 3GPP and IEEE model their traffic following the truncated pareto distribution. The arrival of packets in this case is modeled with a mean , and distribution parameters and as:
17
The arrival of packets is bounded between 1 and c. Similar to the lognormal distribution, Pareto distribution is for continuous values as well. Therefore a rounding operation is necessary to ensure discrete values of packet arrivals.In our system model, we assign randomly the distributions to each at the start of the transmission process, resulting in clusters with different data arrivals and thus different requirements.
Energy and battery model
The energy and battery model defines the structure of the battery and the process of energy harvesting. Each has a battery with a capacity of energy units. The battery can be recharged by harvesting energy from external ambient sources. Each energy unit corresponds to Joules. We assume that for each timestep t, an amount of energy units are harvested and stored in ’s battery. Their arrival is modeled following the Poisson distribution with mean :
18
The battery level, i.e., the number of energy units in it, at timestep t is denoted as .Scheduling decisions
The cluster head CH is the decision center for its own cluster, and thus handles the scheduling decisions for the UEs in this cluster independently from other clusters, then broadcasts them for free. Formally, of cluster , upon receiving the necessary information from all with at the start of a timestep t, outputs the scheduling decision (idle, local, or offload), as well as the number of packets for all these as the following:
Idlei: does not process any packet, thus .
Locall: processes locally packets, which cannot exceed a maximum number .
Offloado: offloads packets to be processed remotely at the MEC server located near the BS. A limit of is set on the number of offloaded packets .
Energy equations
We compute the energy associated with each scheduling decision to determine the number of energy units required to perform these decisions.
Idlei : does not consume any energy,
19
Locall : processes locally with power per processed packet. The corresponding energy is thus:
20
Offloado: offloads packets with power that cannot exceed the maximum offloading power . We distinguish two cases:
Offloading with no transmission errors (no rate mismatch): sends its packets, awaits their execution, receives the broadcasted signal by the BS, and decodes the result. The corresponding consumed energy, highlighting each step, is given by:
21
where and are the sizes of packets in bytes in the uplink and downlink, respectively, is the waiting time, , , and are the waiting, reception, and decoding powers, respectively, and is the decoding efficiency factor. Note that the reception at requires receiving all offloading UEs’ signals. Thus, the reception accounts for the lowest downlink rate and the highest number of packets among the UEs’ signals in the term , which indicates the reception of the signals with the slowest time. In addition, decoding ’s signal requires the decoding of all offloading UEs’ signals with lower channel gains (due to the higher allocated power) in a sequential manner, following SIC decoding.Offloading with transmission errors (rate mismatch): We assume that sends its packets but the timeout is exceeded due to not receiving the ARQ acknowledgement (in form of a response to the packet). This indicates that a rate mismatch has occurred. In that case, the reception and decoding parts do not exist anymore, yielding the following expression of the energy consumed:
22
Time constraints
Successful offloading operation occurs only when the offloading, waiting, receiving, and decoding processes are performed within the fixed timestep duration . Using (21), we consider the times of the different steps and formulate the inequality as:
23
We can obtain the optimal offloading power by forcing equality in the above expression and by using the explicit form of in (4). Thus, we obtain:24
with25
Problem formulation and resolution
Given the available scheduling choices, the objective of this work is to determine optimal scheduling-offloading policies that consider the data buffer status of the UEs, their battery levels, and channel conditions. These policies guide the decision on whether to execute packet processing locally or offload it to the MEC server. To achieve this, we phrase the problem as a Markov Decision Process (MDP), where we define: a state space representing the system’s environment, an action space outlining feasible scheduling actions, and a reward function that provides feedback on the impact of each action taken.3 Next, we detail each component of the MDP as it applies to our system model.
MDP structure
State Space : Each cluster has a cluster head , a randomly selected UE in the cluster, that handles the decision-making for all the UEs. The available information at is then the buffer, battery, and channel gain states of the cluster UEs, as well as the average channel gain of the other clusters. Therefore, the state representation, denoted as , is specific for each cluster, and is defined as:
26
with , and being, respectively, the buffer vectors, quantized channel gains, and battery levels of the UEs in . is the set of averages for the statistical channel gains for other clusters with and . We formulate each component with the following equations:27
28
29
30
To compute the number of possibilities of the average cluster channel gains vector , we need to take into account the total number of UEs in the clusters that can vary between and , the number of clusters , and the number of channel gain subintervals . Therefore, the maximum number of possible states of , denoted as counts for all the possible average cluster channel gains, and is bounded by:31
Subsequently, the cluster state space size is thus bounded by:32
Action Space : Each cluster has a specific action space concerning only the UEs of said cluster. It is denoted as , and it represents the type of processing (idle, local or offload) as well as the number of packets m to execute. From that, we define as the index that represent the set of actions for all UEs in cluster . can take values ranging from 0 to , where is the action space size bounded by:
33
where and are the maximum numbers of packets that can be executed locally or remotely, respectively. Each action index can be extracted using a base changing, from the decimal to a base that corresponds to the number of available actions per UE, :34
The specific action for each can be derived as the following:corresponds to the idle action, with ,
corresponds to the local action with ,
corresponds to the offloading action with .
Reward Function : For each state-action pair in a given cluster, a reward is associated. We define the reward function following a policy for a cluster as the expected negative sum of the packet losses. The losses are due to delay violation and buffer overflow , in addition to the transmission error for all cluster UEs. The reward is expressed for an infinite horizon model with a discount factor as:
35
We aim to find the optimal policy that achieves the maximum expected reward for each cluster. However, with the clustering structure that we have implemented, the goal becomes optimizing the reward for the whole system:36
while for every in cluster , the limits on the maximum offloading power, energy consumed, and packets executed are set to:37
The resulted optimal policy achieving this is a global one, i.e., , for , that accounts for the different configurations of the clusters in terms of the number of UEs, their data models, and their channel gain subintervals. This is done using federated learning as we will explain in the next section.
Proposed resolution
Finding the optimal policy for each cluster is hard to achieve analytically, hence we revert to iterative methods that attempt to converge to the optimal solution. In addition, as we have a complex and large state in this problem, using model-based methods that require full knowledge of the MDP structure (including the transition function) becomes infeasible in practical implementation. They can take exponentially long time to reach the optimal policy. Therefore, we need to consider model-free RL methods that can learn the environment by trial-and-error.
Moreover, DRL methods that rely on NNs to produce their output can be a powerful tool to obtain good policies that achieve the desired performance. However, training the DRL agent on a large state space while considering all UEs at once makes the problem very difficult. This has motivated us to introduce clustering of the UEs to eventually simplify the problem resolution.
In our multi-cluster setup, we require thus a multi-agent RL model, where each agent runs on a CH of a cluster. A cooperation process is enabled between clusters using federated learning FL, which is a learning framework for distributed models. Each model runs locally at the CH level using heterogeneous data different from other models (providing data privacy). A global model is then aggregated from all the local models at the central node located at the MEC server. The dynamic nature of the clusters will lead to federated reinforcement learning FedRL agents that learn the flexibility in the system.
In the sequel, we describe the multi-agent training procedure, the clustering model, and the investigated DRL algorithm, namely the proximal policy optimization (PPO).
Multi-agent training procedure
Learning policies in a multi-agent FedRL follows several steps, as illustrated in Figs. 3 and 4. We assume that the information exchange happens in a separate bandwidth from the transmission one, and that the time required to do this is negligible. Each cluster has a local FedRL agent with weights that produce a policy , while the MEC server has the global model with weights and policy . At the end of the training procedure, all the clusters must have the same global model with weights and policy . Details of each sequential step are listed below.
[See PDF for image]
Fig. 3
Federated Reinforcement Learning Procedure. A: The local models trained with clusters information, and transmitted to the central node (MEC Server). B: The global model weights aggregated from the received local weights, and broadcasted to the nodes (CHs)
First at the start of a new episode e, the UEs are grouped into distinct clusters, where one cluster head is selected at random to be the decision-making agent for all UEs in . The BS transmits to the average of all the UEs’ channel gains in other clusters, represented by the vector , and the new episode e starts.
During the episode e, and at a timestep t, each in cluster with shares with its buffer vector , its quantized channel gain , and its battery level .
uses the information provided at each timestep to decide on the action to take using the FedRL model, and broadcast the individual action to each in the cluster. The reward obtained will be also shared with the cluster head. This information sharing process is highlighted in Fig. 4.
The information shared by the UEs in to is then used to train the local agent with weights . After a certain number of steps (e.g., after a certain number of timesteps or episodes), the local weights of each CH agent are sent to the MEC server.
After receiving the local weights from the CHs, the MEC server averages the weights and obtains the weights of the global model as:
38
The MEC server broadcasts the global model weights to all CHs, and the process repeats until the system converges to a global solution, i.e., , invariant to different data models and numbers of UEs per cluster. Note that the re-clustering process takes place at each episode. The learning process is illustrated in Fig. 3.
[See PDF for image]
Fig. 4
Information sharing with the Cluster Head
Clustering model
The proposed clustering process deals with two conflicting issues jointly. The first one is the clustered UEs’ channel conditions that have to be contrasting to maximize NOMA spectral efficiency [13], while the second issue is the need for a spatial proximity between UEs in order to cluster them. In Fig. 5, we show an example where the UEs can be located. They are defined in our case by their angle w.r.t the BS. Each circle with a shade of blue corresponds to a quantized channel gain subinterval. The UEs channel gains vary along the channel range, following the truncated Rayleigh distribution within their assigned subintervals as explained in 3.1. Moreover, the UEs can be in the same channel subinterval with different angles, that determines the position of the UEs w.r.t. the BS.
[See PDF for image]
Fig. 5
Displacement of UEs in the grid around the BS, with different channel subintervals and angles
[See PDF for image]
Fig. 6
Clustering the UEs following their polar coordinates
In other words, we use the polar coordinates to determine the position of a UE in the grid, with the BS positioned in the origin. The r value refers to the channel subinterval, while refers to the angle. The polar coordinates allows the clustering of UEs that have different r levels and the same (or adjacent) angles . The resulting clusters for the previous example are represented in Fig. 6, where each cluster is highlighted by the green color.
Formally, given the set of all UEs in the grid , we quantize the angles space into discrete values , to allow UEs to be in a finite number of angles. Thus, before a re-clustering process at the start of a new episode , from the set of all UEs is assigned discrete channel subinterval and angle . It transitions from the previous episodes’ parameters following (2) for the channel subinterval and (39) for the angle transition, which is similar to the channel one4:
39
with being the angle range correlation factor, meaning that the angle in episode e affects the new angle in episode , simulating more accurately the changing UE in location. Then, the clustering algorithm produces the set of clusters , with each having UEs ranging between and . In what follows, we describe the clustering process in details, where the steps are organized according to their priority. A verification mechanism is implemented after each step to verify if the produced clusters are valid, in the sense that the number of UEs per cluster is bounded by the minimum and maximum values allowed, and the number of clusters is less than the allowed maximum number . Once the verification is validated, the clustering process terminates early.Create as many clusters as there are UEs to allow for the merging of clusters in the following steps, and to make the verification step easier:
Combine cluster with together if the following conditions are met, with :
The UEs in both clusters are assigned to different channel subintervals:
The UEs in both clusters have similar or adjacent angles: with being the angle quantization step.
For every that was not clustered with other UEs in the previous step, assign them to a cluster if:
The number of UEs in is less than the maximum:
’s angle is equal to or less than the angle quantization step compared to every in :
If is not assigned to another cluster in the previous step, cluster with it another that is already clustered in , after satisfying the following conditions:
The number of clusters after this step is less than the maximum number of clusters allowed:
The number of UEs in is more than the minimum:
The angle difference between the two UEs is less than the angle quantization step:
Proximal policy optimization
We choose Proximal Policy Optimization (PPO) [4] as the DRL technique to train multi-agent FedRL system on a cluster basis. Our choice is motivated by our previous works on smaller scale systems [2, 3] where PPO showed great learning capabilities compared to other RL and DRL-based algorithms. PPO is a policy gradient, actor-critic, NN-based reinforcement learning algorithm that produces a policy distribution, and maximizes an objective function to improve said policy. During training, PPO relies on a set of two NNs with parameters and to estimate the policy , as well as the value function , given an input state s. The outputs are then used to compute the objective function and update the parameters of the NNs via gradient ascent. In parallel, during testing, only the NN that outputs the policy is necessary to navigate in the environment. Intuitively, the actor-critic nature of PPO relates to the generation of policy that is used to sample an action (actor part), in addition to the generation of a value function estimation (critic part) that provide additional information on the quality of the state and the future rewards. This combination helps actor-critic algorithms in general, and PPO in particular, to maximize the objective function, improve on the policy estimation and converge to an optimal one.
The objective function used for PPO is defined as follows:
40
where:represents, for a state s and a sampled action a, a ratio between the policy value at the current epoch and the old policy value from the previous one. It measures the change that has occurred in the policy. It depends only on the parameters of the actor network .
is the advantage function, defined for a state at timestep t as:
41
with a discount factor , and a generalized advantage estimation factor . measures the quality of the sampled action in terms of the reward resulting from taking this action, and comparing to the expected reward, i.e., the value function estimation. It depends on the parameters of the critic network .
42
The training procedure of PPO is comprised of multiple steps as described below:Given an actor network with parameters , and a critic network with parameters , the agent navigates the environment. Starting with a state s fed as input to PPO, the actor network produces a policy distribution estimation , while the critic network outputs the value function estimation . An action a is sampled from the policy distribution and executed, transitioning the agent to a new state while receiving a reward . The tuple is stored in a memory buffer, and the new state becomes the current one.
Step 1 is repeated for multiple timesteps until filling up the memory buffer. If an episode terminates, the process restarts with a newly initialized state s. The filled memory buffer is used for evaluating the objective function and further update the set of weights .
Random mini-batches are selected from the memory buffer for training. The advantage function estimations and the policy ratios are computed for a certain mini-batch b. The resulting values help compute an augmented version of the objective function for the mini-batch, defined as follows:
43
with being a squared error term for the value function, and being the entropy loss that encourages exploration. and are some hyper-parameters to tune.The set of weights is updated for the mini-batch objective function computation following the gradient ascent method, with a learning rate :
44
45
The random selection of mini-batches within the same batch of tuples in the memory buffer is redone for iterations. After that, the memory buffer is reset (marking the end of an epoch), and the previous steps are repeated again for several epochs until convergence. The output used for testing is the policy .
With the number of UEs changing after each re-clustering operation, the PPO model needs to accommodate this dynamic aspect in the system. More specifically, the inputs and outputs of the PPO’s Neural Networks (the actor and the critic networks share the first few layers, therefore their input is the same) have to consider the maximum size of the state representation and the maximum number of actions (in the actor’s output). Thus, the PPO input state for , denoted as , is defined as:
46
where zero padding is performed on the elements from index to index if , and from index to index if . Zero padding is performed in the following way:47
48
49
50
Basically, we zero pad the remaining empty spots in the state representation with UEs that have no data in the buffer, the lowest channel gain, and an empty battery. For that represent information on other clusters, we fill up the spots with the maximum average channel gain (to avoid considering them in the interference estimation for 6).Moreover, we can set the actor’s output size as:
51
which is the same as the maximum action space size .Simulation results
To assess the performance of the proposed model, we simulate a system with UEs. Each has a buffer of size with a maximum delay on the packets that arrive, as well as EH capabilities with a battery of size energy units of . In addition, we choose a random data model among the models described in Sect. 3.3, which are Poisson distribution with mean , uniform distribution with and , Lognormal and Pareto distributions with . We assume that the channel gain range is quantized into subintervals, with and , and quantization steps for each subinterval . The angles range is similarly quantized into discrete angles.
We cluster the UEs in at most clusters, with each cluster having a number of UEs between . The re-clustering is done before the start of every episode.
The cluster scheduling decisions are taken by the at the start of a timestep. It performs either local operations for with a maximum of packet, or offloading packets to the MEC server with a maximum of packets that can be offloaded at once.
When processing locally, the power associated with the decision is , whereas the maximum allowed offloading power is . In the offloading operation, sends to the MEC server packets of size , over a bandwidth of , and a channel noise spectral density of . It awaits the execution for while consuming the power . For downlink transmission, the resulted packets have size and the BS uses a transmission power of . The total downlink bandwidth is allocated partially to each cluster depending on the number of UEs in it. At the UEs, the reception power is set to and the decoding power to s. In fact, each receives the signals from all the other UEs in the same cluster and performs SIC decoding to decode its signal after decoding the signal of other UEs with weaker channel (i.e., higher allocated power). The whole offloading operation has to be done within the timestep duration .
To train the Federated PPO multi-agent setup, each agent has 2 NNs (actor and critic networks) of 4 layers, the first 2 are shared between them. The number of neurons is 128 per layer with ReLU activation functions. The number of training epochs is the same as the number of episodes with each episode having timesteps, and the memory buffer size is 1024 tuples. The batch size when sampling experiences from the memory buffer is 64, the policy clip , the generalized advantage estimation factor , the discount factor , the learning rate , and the number of iterations of batch sampling is . Once an episode terminates, the weights are shared with the MEC server, and a re-clustering process is initiated for the next episode.
We compare the Federated PPO method in terms of the percentage of lost packets, as well as the energy consumption during an episode with some naive methods extended to a multi-Agent setup, that operate given the availability of energy units in the battery:
Naive Offload (NO): Only offload actions, with the highest number of packets to execute.
Naive Local (NL): Only local actions, with the highest number of packets to execute.
Immediate Scheduling (IMM): Local or offload actions, with the highest number of packets to execute.
Naive Random (NR): Randomly select an available action.
[See PDF for image]
Fig. 7
Average percentage of dropped packets for each approach
The energy consumption is evaluated in Fig. 8, where the number of averaged energy units consumed during an episode per cluster is shown. We can observe that Federated PPO achieves lower energy consumption compared to the naive methods and the immediate scheduler. In addition, the NO achieves less than the other naive methods simply because it stays idle too often. Therefore, the Federated PPO method offers good performance while consuming less power than the other methods.
[See PDF for image]
Fig. 8
Average energy units consumed for each approach
Investigating the clustering outcomes during the testing episodes reveals that a clusters configuration with UEs each occurs for of the episodes. The second highest occurring configuration is the one with clusters of UEs with . The two other possible configurations of clusters of and UEs happen and of the testing episodes, respectively.
In addition, an action analysis is carried out it Figs. 9, 10 and 11, where each pie chart represents the percentage of actions taken during an episode (idle actions are omitted) for a UE in a cluster with number of UEs, respectively. The figures show that the instances where intra-cluster NOMA is used increases with the number of UEs in the cluster.
[See PDF for image]
Fig. 9
Average percentage of actions taken by each UE in clusters. Idle actions represent
[See PDF for image]
Fig. 10
Average percentage of actions taken by each UE in clusters. idle actions represent
[See PDF for image]
Fig. 11
Average percentage of actions taken by each UE in clusters. idle actions represent
Conclusion
In this paper, we have tackled the scheduling and task offloading of UEs in a dynamic multi-cluster NOMA system supported by Mobile Edge Computing, addressing key constraints such as strict task delay requirements, limited-size rechargeable batteries on UEs powered by energy harvesting, and varying channel conditions. To minimize packet loss, we have investigated the DRL method, Proximal Policy Optimization, with proven great generalization abilities. We have proposed a novel clustering process to enhance NOMA downlink performance and reduce the complexity of the DRL model. Our approach has also included a FedRL framework to develop a global policy across agents in a multi-agent setup. Our results have demonstrated that the federated PPO policies outperform standard methods, showing promising scalability for more complex systems.
Future work will expand this framework to a multi-cell environment with multiple MEC servers, allowing for the management of inter-cell interference and more efficient resource allocation across clusters. Additionally, further investigation of our clustering process, including comparative analysis with alternative clustering algorithms in the literature, will be undertaken to refine cluster configuration and task scheduling dynamics. Realistic considerations such as latency, signaling overhead, and computational complexity will also be integrated to better assess the system’s performance under practical deployment constraints.
Author contributions
I.D. wrote the main manuscript and prepared the figures, supervised by M.S. and P.C. All authors reviewed the manuscript.
Funding
Common PhD thesis grant from Samovar Lab and TSN Carnot institute
Data Availability
No datasets were generated or analysed during the current study.
Declarations
Conflict of interest
The authors declare no competing interests.
The ordering of signals from strongest to weakest is not a necessary condition to ensure the highest NOMA spectral efficiency in the uplink (contrary to the downlink case), but rather to organize the signals for SIC decoding, and to obtain (4).
2A similar structure to the Automatic Repeat Request (ARQ) method, with the packet counter being the same as the maximum delay .
3A transition model is typically part of MDP modeling, but its definition is not necessary in this work due to the use of model-free RL algorithms as we will see in problem resolution.
4since the angles define a circle, a rollover functionality is implemented to account for adjacent angles that are in distant values.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
1. Y. Saito, Y. Kishiyama, A. Benjebbour, T. Nakamura, A. Li, K. Higuchi, Non-orthogonal multiple access (noma) for cellular future radio access, in IEEE 77th Vehicular Technology Conference (VTC Spring) (2013). IEEE
2. I. Djemai, M. Sarkiss, P. Ciblat, Joint scheduling-offloading policies in noma-based mobile edge computing systems, in IEEE Wireless Communications and Networking Conference (WCNC) (2023). IEEE
3. I. Djemai, M. Sarkiss, P. Ciblat, Noma-based scheduling and offloading for energy harvesting devices using reinforcement learning, in IEEE Asilomar Conference on Signals, Systems, and Computers (2023)
4. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
5. J. Konený, H.B. McMahan, F.X. Yu, P. Richtarik, A.T. Suresh, D. Bacon, Federated learning: Strategies for improving communication efficiency, in NIPS Workshop on Private Multi-Party Machine Learning (2016). https://arxiv.org/abs/1610.05492
6. Tianqing, Z; Zhou, W; Ye, D; Cheng, Z; Li, J. Resource allocation in IoT edge computing via concurrent federated reinforcement learning. IEEE Internet Things J.; 2021; 9,
7. Yu, S; Chen, X; Zhou, Z; Gong, X; Wu, D. When deep reinforcement learning meets federated learning: intelligent multitimescale resource management for multiaccess edge computing in 5g ultradense network. IEEE Internet Things J.; 2020; 8,
8. Li, X; Lu, L; Ni, W; Jamalipour, A; Zhang, D; Du, H. Federated multi-agent deep reinforcement learning for resource allocation of vehicle-to-vehicle communications. IEEE Trans. Veh. Technol.; 2022; 71,
9. Consul, P; Budhiraja, I; Arora, R; Garg, S; Choi, BJ; Shamim Hossain, M. Federated reinforcement learning based task offloading approach for MEC-assisted WBAN-enabled IOMT. Alexandria Eng. J.; 2024; 86, pp. 56-66. [DOI: https://dx.doi.org/10.1016/j.aej.2023.11.041]
10. P. Consul, I. Budhiraja, D. Garg, N. Kumar, R. Singh, A.S. Almogren, A hybrid task offloading and resource allocation approach for digital twin-empowered UAV-assisted MEC network using federated reinforcement learning for future wireless network. IEEE Trans. Consum. Electron. (2024)
11. X. Chen, Z. Li, W. Ni, X. Wang, S. Zhang, Y. Sun, S. Xu, Q. Pei, Towards dynamic resource allocation and client scheduling in hierarchical federated learning: a two-phase deep reinforcement learning approach. IEEE Trans. Commun. (2024)
12. Yang, F., Zhao, Z., Huang, J., Liu, P., Tolba, A., Yu, K., Guizani, M.: A federated reinforcement learning approach for optimizing wireless communication in uav-enabled iot network with dense deployments. IEEE Internet of Things Journal (2024)
13. Ali, MS; Tabassum, H; Hossain, E. Dynamic user clustering and power allocation for uplink and downlink non-orthogonal multiple access (NOMA) systems. IEEE access; 2016; 4, pp. 6325-6343.
14. Cui, J; Ding, Z; Fan, P; Al-Dhahir, N. Unsupervised machine learning-based user clustering in millimeter-wave-Noma systems. IEEE Trans. Wire. Commun.; 2018; 17,
15. Shahini, A; Ansari, N. Noma aided narrowband IoT for machine type communications with user clustering. IEEE Internet Things J.; 2019; 6,
16. Z. Wang, M. Pischella, L. Vandendorpe, Clustering and power optimization for Noma multi-objective problems, in IEEE 31st Annual International Symposium on Personal, Indoor and Mobile Radio Communications (2020). IEEE
17. Hamedoon, SM; Chattha, JN; Bilal, M. Towards intelligent user clustering techniques for non-orthogonal multiple access: a survey. EURASIP J. Wirel. Commun. Netw.; 2024; 2024,
18. Y. Lin, K. Wang, Z. Ding, Rethinking clustered federated learning in noma enhanced wireless networks. IEEE Trans. Wire. Commun. (2024)
19. El Hassani, H; Savard, A; Belmega, EV; De Lamare, RC. Multi-user downlink noma systems aided by ambient backscattering: Achievable rate regions and energy-efficiency maximization. IEEE Trans. Green Commun. Network.; 2023; 7,
20. X. Liu, T.N. Davidson, Multiple-time-slot multiple access binary computation offloading in the k-user case. IEEE Trans. Signal Process. (2024)
21. A. Taneja, S. Rani, W. Boulila, Resource control in IRS assisted multi-access edge computing for sustainable 6g IIOT networks. IEEE Open J. Commun. Soc. (2025)
22. A. Benjebbour, K. Saito, A. Li, Y. Kishiyama, T. Nakamura, Non-orthogonal multiple access (noma): concept, performance evaluation and experimental trials, in 2015 International Conference on Wireless Networks and Mobile Communications (WINCOM), pp. 1–6 (2015). IEEE
23. Navarro-Ortiz, J; Romero-Diaz, P; Sendra, S; Ameigeiras, P; Ramos-Munoz, JJ; Lopez-Soler, JM. A survey on 5g usage scenarios and traffic models. IEEE Commun. Surveys Tutorials; 2020; 22,
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.