Full text

Turn on search term navigation

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1. Introduction

With the rapid development of removable devices, large amounts of data will be generated in real time. Also, as public concern about data security rises [1], it leads to reluctance to share their private data, ultimately creating data silos. The most straightforward way to address data silos is to centralise data collection, unified processing, cleansing and modelling. In most cases, however, the leakage of data occurs at the collection and processing stage, which is unacceptable. The traditional machine learning methods mostly use a centralised approach to train the model, which requires all the training data to be preserved in a central server. However, this way has the risk of privacy leakage, thus failing to safeguard users’ privacy. In contrast to traditional machine learning, federated learning (FL) is a distributed training method. Specifically, in FL, users (also known as workers or clients) share models trained from local data rather than their private data [2]. Since it is difficult for attackers to get source data from the model, it satisfies the privacy requirements of clients.

FL can be used for things such as improving health care, making smartphones smarter and protecting privacy. It can also help reduce the environmental impact of data centres and enable more inclusive and diverse data sources. Despite the applications and advantages of FL in various fields, it also faces serious challenges. One of the challenges is security. Although the clients participating in FL training are not required to upload private data, it cannot guarantee that all clients are honestly involved in training because of the distributed nature of FL. Dishonest clients, or called attackers, use different types of attacks to make the final model perform less well. One type of attack is a data-poisoning attack, which prevents FL’s central server from completing the target. For instance, the authors in [3] designed a loss function data-poisoning attack that reverses the benign model, while the authors in [4] crafted a bandit-based attack area UCB (AR-UCB) algorithm to perform dynamic data poisoning attacks, thereby hindering the FL central server’s ability to fulfil its target. Another challenge is the fairness issue. In general, models that are favoured by the server or trained on larger datasets are given higher importance in the aggregation process. This results in other clients, also called workers, not being able to or rarely joining in the task of FL.

There have been many research attempts to address these challenges. Previous work has had methods for fending off data poisoning attacks and thus picking out reliable clients. Most of these research studies have used trust models to assess the reliability of the client, thus enabling the server to select credible clients. Reference [5] proposed a reputation-aware stochastic integer programming-based FL client selection method. Reference [6] utilizes deep reinforcement learning to dynamically select clients based on reputation. There is also some research on filtering attackers by monitoring users’ behaviour, thus allowing normal users to participate in global aggregation to complete FL tasks. Reference [7] identifies attackers by calculating the client-wise angle similarity for the clients’ last-layer gradients. Reference [8] utilizes pairwise cosine similarity, clustering mechanism and filtering strategy to filter out malicious updates. Reference [9] introduced an algorithm for accurately detecting malicious model updates to find malicious workers. Reference [10] introduces a fairness-aware incentive mechanism in federated learning that promotes both aggregate and reward fairness. Reference [11] exploits incentive mechanisms to fairly reward clients to attract reliable and efficient clients. Reference [12] utilizes deep reinforcement learning algorithms to achieve fair model allocation in federated learning. Some other studies have proposed new frameworks or algorithms to ensure fairness. In [13], an asynchronous FL framework with adaptive client selection ensures the client’s long-term fairness. Reference [14] proposed an adaptive fairness learning algorithm that goes through the update of the local model to adjust the fairness coefficients and ultimately improves the generalisation of the global model. However, without a dynamic mechanism, it is difficult for the server to maximise its security and fairness to the participants.

Reinforcement learning (RL) is a pivotal field in machine learning, which focuses on finding the best action for an agent to maximise the reward it receives in a complex, uncertain environment [15]. However, RL faces some challenges, such as the curse of dimensionality, the balance between exploration and exploitation and sparse rewards. To address these issues, some researchers have proposed an approach that combines deep learning and RL, which is called deep reinforcement learning (DRL). DRL integrates the feature extraction of deep learning and the decision-making capabilities of RL to skillfully solve complex problems in the real world. The capabilities of DRL permit it to solve complex problems in the real world more effectively. What is more, DRL has effectively tackled key challenges within the FL domain, such as solving the problem of online resource allocation in the vehicular fog network [16], achieving adaptive and efficient communications and credible data interactions [17] and incentivize users to participate in model training over time [18].

In this article, we propose an approach based on deep deterministic policy gradient (DDPG) to select reliable clients fairly. The DDPG algorithm is typical in DRL, which can handle high-dimensional observation spaces and continuous action spaces [19]. It fits with our approach, which considers a multitude of factors and potential actions in a continuous space when selecting clients. In the proposed program, clients with high security-fairness value will be selected to participate in federated learning. This value is calculated based on the client’s security and fairness scores and the weighting strategy. The weighting strategy is developed by DDPG based on the current environment. Specifically, DDPG uses a behavioural critique network framework to process continuous and complex action spaces to find the optimal weighting strategy [19]. In addition, techniques such as experience playback, goal networks and noise exploration are employed to provide the agent with exploration capabilities to avoid falling into a local optimum. Using this approach to intelligently determine the weighting strategy, thereby selecting clients, can minimize the adverse impact of unreliable clients on the global model. At the same time, the system will select reliable clients who can fairly participate in the federated learning process. The main contributions of this paper are listed as follows:

• We calculate the security score by using a beta distribution function to assess the trustworthiness of a local client. In addition, the fairness score is derived from the historical participation of local clients in global aggregation, evaluating their equitable involvement.

• We introduce an adaptive weighting strategy for the security-fairness value, leveraging DDPG. This approach empowers the FL system to equitably select reliable clients while mitigating the impact of unreliable ones

• The final experimental results show that our approach enables the system to fairly select reliable clients, enhancing the security and fairness of the FL training process.

This article is structured as follows. In Section 2, we provide an overview of the existing literature on FL security and fairness. The system model is presented in Section 3. Then, we explain our proposed weighting strategy for security fairness based on DDPG in Section 4. After that, the performance results of our proposed model are analysed in Section 5. Finally, a conclusion is proposed in Section 6.

2. Related Work

FL has been extensively studied in recent work. Reference [20] proposed an adaptive client selection framework to improve convergence performance. In [21], a data-type, resource and time-aware protocol is proposed for selecting FL clients to reduce client exit, model convergence time, number of information exchanges and the time required to reach a certain accuracy. Reference [22] proposed to select clients based on client evaluation accuracy to achieve a specific accuracy improvement. This method focuses only on evaluating accuracy metrics and may not perform optimally in heterogeneous client environments with widely varying accuracy metrics. In [23], the authors propose the two-stage secure aggregate sparsification algorithm, which achieves privacy guarantees, convergence and performance improvements in FL by having the client apply a pairwise multiplicative random mask to the sparse subnetwork. In [24], the researcher introduces a framework for graph-based client selection to accommodate heterogeneity in FL. However, the effectiveness heavily relies on accurate graph representations of client relationships. Reference [25] proposed to select clients based on network prediction of dynamic network conditions and quality of training data to tackle the challenges posed by high system heterogeneity in time-sensitive FL scenarios. This approach relies on accurate predictions of network conditions and data quality, which may not always be achievable or reliable in the real world.

Some researchers have used cryptographic authentication to screen attackers and ensure the security of FL. In [26], the researchers validate the local model using cryptographic primitives and compare the proof result with the list of aggregated models, and if the intersection length between the two resists a threshold, the system is considered to have an attacker. Reference [27] introduced a privacy-preserving scheme for FL in edge computing. The authors start with a streamlined protocol employing shared secrets and weighted masks to safeguard gradient privacy, enhancing defences against device dropout and collusion. In addition, they propose an algorithm utilizing digital signatures and hash functions to guarantee message integrity, consistency and resilience against replay attacks. Finally, they suggest a periodic averaging training strategy to enhance overall operational efficiency. Reference [28] proposed the use of cryptographic primitives, including masks and homomorphic encryption, to prevent privacy leakage. In [29], the authors propose an encryption scheme to defend against backdoor attacks, which employs adaptive local differential privacy techniques and compressed sensing. Reference [30] developed a comprehensive user behaviour model utilizing various attributes. The model incorporates direct evidence of trust and recommended trust data to improve the accuracy of trust assessment in the presence of dynamic behavioural changes. In [31], a protocol based on BLS signatures and multiparty security has been designed, which verifies the integrity of parameters and the correctness of results. However, using this approach requires significant computational resources and communication costs.

There are several solutions to address FL security by employing various techniques and analyses to detect attackers. Reference [32] designed a model based on convolutional neural networks to classify normal and abnormal users. It also uses blockchain-integrated cryptography-based FL technique to remove anomalous users from the database. Training such a classification model requires a large amount of labelled data. The way anomalous users are handled may cause the system to lose its ability to track and problem-solve. Reference [33] proposed a covert communication-based FL to defend against attacks. However, transmissions with this method are unstable and susceptible to sniffing. Reference [34] introduced a temporal convolutional generative network for semisupervised learning in semilabelled data to achieve network attack detection. Reference [35] develops an intrusion detection method based on a semisupervised FL scheme via knowledge distillation. In [35], an intrusion detection method is devised using a semisupervised FL approach with knowledge distillation. It utilizes unlabelled data and knowledge distillation to accomplish intrusion detection. Device resources may limit these methods.

Several researchers have focused on allowing servers to fairly select clients to participate in the federated learning ground process. In [36], the researcher proposed a weight selection algorithm that integrates training accuracy and frequency to measure weights, ensuring fair client aggregation on the server. However, the method is limited to horizontal FL. In the study of [37], the central server combines the local loss, data size, computation power, resource demand and last update time of local clients into an overall index and then selects a group of workers in each round to participate in FL task. However, there is no consideration of how to update the weight parameters. Reference [38] presents a novel optimisation objective function that makes each local model fairly (not necessarily equally) contribute to the training of the global model. The optimisation objective function consists of a fairness term and a training loss reduction term. In [39], the authors suggested transforming the training of the individual fair FL model into an adversarial training approach, which ultimately improves both individual fairness and group fairness. In [40], the researchers quantified fairness in joint learning using the Gini coefficient and proposed fairness interventions in the data fitting phase. In addition to this, the authors integrated a penalty term in the objective function of FL to achieve a balance between model performance and fairness. Nevertheless, these studies do not consider an attacker, and by default, the process of FL is safe.

In [41], the authors introduced a registry based on smart contracts for tracking and recording local data. In addition to this, the authors proposed an algorithm for sampling a weighted fair training dataset, aiming to improve the fairness of the model. Reference [42] addressed the fairness issues by controlling the unmanned aerial vehicles (UAVs) 3D trajectory, transmission power and scheduling time for task offloading by mobile ground users. One of the UAVs in the UAV pairs will act as a jammer that suppresses eavesdroppers. These two studies propose two components to address the security and fairness of FL, respectively.

3. System Model

To address the security and fairness challenges in FL, a model based on the DDPG client selection strategy is proposed to overcome the influence of unreliable clients and fairly select reliable clients to participate in global model training. First, the FL client downloads the global model from the server. Then, the client trains and updates the model using its local dataset. Next, the server uploads the trained local model to the server. After that, the server determines the reliability of the local model by using an attack detection method and calculates the security score for each client. Consequently, the server calculates the fairness score based on the past participation of the clients in the global aggregation. Subsequently, the server uses the DRL algorithm to select an aggregation strategy for the security-fairness value and computes the security-fairness value for each client. Finally, the server selects the local model uploaded by the client with the higher security-fairness value for aggregation to achieve one global model update. The following subsections will discuss the proposed security-fairness value weighting strategy based on DRL. The model for executing client selection strategy based on DDPG is shown in Figure 1.

[figure(s) omitted; refer to PDF]

3.1. FL Model

In this section, FL will be briefed. FL is a distributed machine learning model, which trains a global model by aggregating the locally trained models from multiple clients. Clients are not required to share their local data, only their trained models, which protects their privacy. In each FL round, the client downloads a global model from the parameter server and trains a local model with their data separately. Then, the client uploads the trained model to the parameter server. After that, the server selects uploaded local models randomly or according to specific requirements, aggregates these local models and then sends the aggregated global model to clients to start the next step of training. The abovementioned training process continues until the global model satisfies the desired conditions.

This work considers $K$ number of clients in the FL model, where $K = 1, 2, \dots, K$ . FL aims to optimize a global loss function $l w$ by minimizing the weighted average of each worker’s local loss function $l w^{k}$ , thus obtaining the optimal global model parameters $w$ , where the weight is determined by the security-fairness value $c^{k}$ per worker during the aggregation process. $\begin{matrix} (1) & l w = \sum_{k = 1}^{K} \frac{s c^{k}}{S C} l w^{k}, \end{matrix}$ where $S C$ is the sum of the security-fairness values of $K$ clients, that is, $S C = \sum_{k = 1}^{K} s c^{k}$ , and $w^{k}$ denotes the local model parameters of the $k$ -th client.

Correspondingly, the client will download the global model parameters $w_{t}$ from the central server and train a local model $w_{t + 1}^{k}$ with their local data samples. Then, the parameter server aggregates the updated local models uploaded by the clients to get a new global model $w_{t + 1}$ , which is denoted as follows: $\begin{matrix} (2) & w_{t + 1} = \sum_{k = 1}^{K} \frac{s c^{k}}{S C} w_{t + 1}^{k} . \end{matrix}$

The stochastic gradient descent (SGD) algorithm is employed as the training algorithm for FL in this paper. In each iteration, SGD randomly selects a batch of training instances, calculates the gradient of the batch concerning the current model parameters $w$ and takes the gradient steps to update the model parameters in the direction that minimizes the loss function $l w$ . The goal, therefore, is to find the parameters $w^{*}$ by minimising $l w$ . $\begin{matrix} (3) & w^{*} = \arg \min l w . \end{matrix}$

3.2. Adversary Model

In this work, we investigate the possibility that participating clients may intentionally or unintentionally submit malicious or unreliable models, thus undermining the integrity and effectiveness of federated learning. The reasons for this phenomenon may stem from limitations in the computational resources or expertise of the clients, resulting in models that exhibit poor generalization and even contain biases. In addition, unreliable clients may deliberately inject corrupted or manipulated models to undermine the trustworthiness and dependability of the federated learning system. The parameters of these unreliable models, denoted as $w_{t}^{k *}$ , are uploaded to the server along with the reliable client-trained local model parameters $w_{t}^{k}$ and then they are aggregated into a new global model $w_{t}$ . The global model $w_{t + 1}$ is updated as follows: $\begin{matrix} (4) & w_{t} = \sum_{k = 1}^{K} \frac{s c^{k}}{S C} w_{t}^{k} + w_{t}^{k *}, \end{matrix}$ where $S C$ is the sum of the security-fairness values of $K$ clients, that is, $S C = \sum_{k = 1}^{K} s c^{k}$ .

This study specifically addresses the threat of data poisoning attacks in federated learning. Data poisoning refers to the injection of maliciously forged data points into the training dataset to corrupt the integrity and quality of the training model. The common data poisoning attacks are label-flipping attack, clean-label attack and backdoor attack. In this work, we consider label-flipping attacks to undermine the integrity of model predictions. Unlike traditional adversarial attacks that modify the input data, label flipping focuses on subverting the model by tampering with the ground truth labels during the training phase. In this adversarial approach, a malicious attacker strategically changes the labels associated with the training instances to inject misinformation during the learning process. This intentional mislabelling causes the model to learn incorrect patterns and associations, thus corrupting the model, which exhibits poor generalisation and vulnerability to misclassification of unseen data.

3.3. Adaptive Attacks

Adaptive attacks involve an attacker dynamically adjusting their strategy based on system feedback to enhance both the effectiveness and stealth of the attack. In FL, the risk of such attacks is particularly high due to the distributed nature of model training across multiple clients. Attackers can monitor model performance during iterations and identify the optimal moment to upload a malicious model, thereby biasing the global model towards incorrect decisions. These adaptive attacks jeopardize the robustness and security of the FL system, rendering the entire training process vulnerable and degrading overall model performance, especially if client selection is poor. Moreover, attackers may further compromise user privacy by analysing model updates to infer sensitive client data. Therefore, effective strategies are essential to ensure the security of FL systems.

This paper proposes a DDPG-based dynamic weighting strategy that adjusts the influence of each client during global model training by evaluating their current performance. The goal is to effectively mitigate the impact of potential attackers and ensure the model remains efficient and accurate in the face of adversarial threats. Specifically, our approach continuously monitors the performance of all participating clients in real-time to assess their reliability throughout the training process. For instance, we utilize historical performance data and real-time feedback to prioritize clients with stable and trustworthy performance, assigning them greater weight. In addition, extra weights are allocated to clients who participate in training less frequently, ensuring that more reliable clients contribute to the global model’s training. This way, even if some clients are attacked or underperform, their negative impact on the global model is significantly reduced. Through this dynamic adjustment and selection mechanism, our approach not only enhances the robustness of the model but also strengthens the overall system’s resilience against adaptive attacks.

3.4. Problem Formulation

Since the parameter server selects only a limited number of local model parameters for global aggregation in each FL round, we introduce a security-fairness value to evaluate the clients to help the server select local clients. This security-fairness value is obtained by weighting the client’s security and fairness scores, and we considered the formulae provided by [43]. In the proposed scheme, the client’s security-fairness weighting function is given by $\begin{matrix} (5) & S C_{t}^{k} = φ a S_{t}^{k} + 1 - φ F_{t - 1}^{k}, \end{matrix}$ where $S C_{t}^{k}$ is the security-fairness value of the worker $k$ at time $t$ , $S_{t}^{k}$ represent the security score and $F_{t - 1}^{k}$ is the fairness score. The constant $a$ maps the security score $S_{t}^{k}$ to a relevant range of the fairness score $F_{t - 1}^{k}$ . $φ \in 0, 1$ is a factor that weighs the security score and fairness score. The high value of $φ$ indicates that clients with higher security values have a greater likelihood of being chosen for participation in global aggregation, whereas the lower value of $φ$ signifies a bias in favour of clients with a higher fairness score in aggregation.

As shown in equation (5), the client’s security-fairness value is calculated by his security score and fairness score. In this article, the beta security system is used to evaluate the reliability of each client. Using the beta distribution for security score calculations provides flexibility in modelling safety probabilities, allowing precise representation and accurately representing and adjusting the confidence level of different clients within the model. The client’s fairness score is calculated based on the number of times the client participates in the global model aggregation. The more the client participates in the aggregation, the lower its fairness score.

Trade-off parameter $φ$ will affect the value of the client’s security-fairness value and the effectiveness of the global model ultimately. The high value of $φ$ means that the client’s security score takes on a high weight. While this method excludes attackers in some cases, it will result in well-performing clients not being selected due to server favour. The low value of $φ$ indicates that the server more prizes on the fairness score of the client. This approach will cause an attacker to accomplish their object no doubt. In the proposed scheme, DRL is used to find the best weighting factor $φ$ , which makes the server filter out attackers and fairly select reliable clients to participate in the aggregation process.

4. DDPG-Based FL Framework

4.1. Security Score Update Policy Based on Beta Distribution

To facilitate the expression and updating of the security score, the beta distribution is utilized to represent the client’s security score, where the beta distributions can be expressed as $\begin{matrix} (6) & P x = \frac{Γ p + q}{Γ p Γ q} x^{p - 1} {1 - x}^{q - 1}, \end{matrix}$ where $Γ$ is the Euler gamma function, $0 \leq x \leq 1$ , $p > 0$ and $q > 0$ .

Upon each upload of a local model by a client, the global model undergoes an impact, categorized into both positive and negative effects. The specific distinction is described in the following. Suppose the client $k$ has uploaded the local model $α + β$ times, where the number of positive and negative impacts is $α$ and $β$ , respectively. Using this information, the server can predict the impact of the model that the client $k$ next uploads. When the security score is initialised without any other prior information, the client’s security score can be expressed as a uniform distribution on (0, 1), i.e., $P x = uni 0, 1 = Beta 1, 1$ . Using the binary scoring model, the posterior distribution of $x$ can be derived as $\begin{matrix} (7) & P x = Bin α + β, α \cdot Beta 1, 1 \\ (8) & = \begin{matrix} α + β \\ α \end{matrix} x^{α} {1 - x}^{β} \cdot \frac{x^{1 - 1} {1 - x}^{1 - 1}}{1} \\ (9) & = \begin{matrix} α + β \\ α \end{matrix} x^{α} {1 - x}^{β} \\ (10) & = Beta α + 1, β + 1 . \end{matrix}$

Based on equation (10), the security score of client $k$ at time $t$ is given by $\begin{matrix} (11) & S_{t}^{k} = E Beta α_{t}^{k} + 1, β_{t}^{k} + 1 = \frac{α_{t}^{k} + 1}{α_{t}^{k} + β_{t}^{k} + 2}, \end{matrix}$ where $α$ and $β$ represent the number of times the local model uploaded by client $k$ positively and negatively affected the global model, respectively.

In each round of FL, the central server evaluates the local models uploaded by each client. A small portion of the test set owned by the server is used to assess the local model. Based on the performance of the local and global models on the test set, we determine the impact that the local model would have. We first define the following quantity $y_{t}^{k}$ to describe the influence of client $k$ on the global model in time $t$ . $\begin{matrix} (12) & y_{t}^{k} = \frac{E A w_{t}, x_{i j}, y_{i j} - A w_{t + 1}^{k}, x_{i j}, y_{i j}}{E A w_{t}, x_{i j}, y_{i j}}, \end{matrix}$ where $x_{i j}, y_{i j}$ are data sampled from the server’s test set, $w_{t}$ is the global model parameter at time $t$ , $w_{t + 1}^{k}$ is the local model parameter uploaded by the client $k$ at time $t + 1$ and $A \cdot$ is the accuracy of the model over the dataset. In addition, the variable Population Stability Index (PSI) is also considered to evaluate the influence of the client-uploaded local model on the global model. This indicator represents the stability of the distribution of local model samples to the distribution of global model samples. The reason for using indicator PSI is that some resource-poor clients should not be judged as attackers even if they perform poorly. The formula of $ps i_{t}^{k}$ is as follows: $\begin{matrix} (13) & ps i_{t}^{k} = \sum_{i = 1}^{n} A_{t}^{k} - E_{t - 1} \ln \frac{A_{t}^{k}}{E_{t - 1}}, \end{matrix}$ where $n$ is the sum of categories of labels. $A_{t}^{k}$ represents the distribution of label clusters in the test set for the local model uploaded by client $k$ at time $t$ , and $E_{t - 1}$ is the distribution of label clusters in the test set for the global model at time $t - 1$ .

During the training process of FL, if $y_{t}^{k} < 0$ and $ps i_{t}^{k} > 0.1$ , the server considers that the model uploaded by client $k$ at time $t$ will negatively affect the global model. Otherwise, the server considers it a positive impact. Finally, the update of the security score is described. If the server considers it a positive impact, the parameter is updated as follows: $\begin{matrix} (14) & α_{t} = α_{t - 1} + u y . \end{matrix}$

On the contrary, the parameter is updated as follows: $\begin{matrix} (15) & β_{t} = β_{t - 1} + 1 - u y, \end{matrix}$ where $u \cdot$ is the utility function, which represents the system’s assessment of the model uploaded by the worker. The function $u \cdot$ is given by $\begin{matrix} (16) & u x = \frac{1}{1 + e^{- c x}}, \end{matrix}$ where the constant $c$ is set to 5. If $c$ is too large, the more sensitive the system becomes to changes in the performance of the local model, leading to an unstable evaluation of the client. If $c$ is too small, the system is again unable to accurately capture changes in local model performance. At $c = 5$ , the function achieves a balanced sensitivity, i.e., it accurately reflects meaningful performance differences without being too reactive. This setting ensures that the system provides an accurate and reliable assessment of model quality, thereby improving overall performance.

4.2. Fairness Score Update Policy

During FL, servers need to consider fairness scores to ensure that ordinary clients participate in the global model aggregation phase. Without such a mechanism, there is a risk of biased model updating, where some clients that are favoured by the server or have better resources dominate the learning process, while others are marginalised. This would result in a trained global model that would hardly satisfy the optimal performance of all clients but would only converge to the optimal performance of a centrally trained model. The fact that the server takes fairness into account helps to promote inclusiveness and diversity in the global model, which ultimately enhances the robustness and representativeness of FL.

In this article, the number of times the server is involved in the global model aggregation progress is considered in the fairness score. The client’s fairness score is updated every time a global model update is performed. The client $k$ ’s fairness score $F_{t}^{k}$ at moment $t$ is specifically updated update is given by $\begin{matrix} (17) & F_{t}^{k} = F_{t - 1}^{k} + δ . \end{matrix}$

When the server selects the local model uploaded by client $k$ at the $t$ -th aggregation, $δ$ is set to $- 1$ ; otherwise, it is set to 0. The initial value of the fairness score for all clients is set to 0. As the number of times a client participates in the global aggregation process increases, its fairness score decreases. The server then becomes more biased towards selecting clients with high fairness scores to participate in the aggregation phase.

4.3. DDPG for Security-Fairness Value

If the server only focuses on the security of the client, it ignores the risk of biased model updates. If the server only prioritises fairness, it will allow malicious attackers to succeed in their goals. This requires the server to weigh these two aspects. Specifically, the server finds it challenging to compute the optimal coefficient $φ$ to weigh the client’s security score against the fairness score to obtain the client’s security-fairness value. To address this problem, this paper proposed a weighting strategy for security-fairness value based on DDPG for client selection.

In our model, the server in FL acts as an agent of DDPG, which not only collects feedback from the environment but also interacts with the environment to determine the optimal factor $φ$ , that is, to determine the optimal weighting strategy for security-fairness value. The components of our proposed framework based on DRL are as follows:

1. DRL agent: the parameter server in the FL system.

2. Environment: the FL system with the DDPG-Enhanced SC model.

3. State space: the state space consists of the previous weighting strategy, the global model and information from all clients. More specifically, the server defines the state as follows: $\begin{matrix} (18) & s_{t} = φ_{t - 1}, ac c_{t - 1}, s_{avg}, m s_{avg}, F_{t} - F_{mid}, \end{matrix}$

where $φ_{t - 1}$ represents the previous factor of the weighting strategy, $ac c_{t - 1}$ is the accuracy of the last global model performance on the test set, $s_{avg}$ is the average of all clients’ security scores, $m s_{avg}$ is the average of the security scores of the clients whose uploaded models were determined by the server to have a negative impact this time and $F_{t} - F_{mid}$ is the difference between the sum and median of all current fairness scores.

4. Action space: in each round in FL, the server selects an action $a_{t} \in A$ , where $a_{t} \in 0, 1$ . This action $a_{t}$ represents the weighting strategy at time $t$ , which is the smoothing factor $φ_{t}$ .

5. Reward: the essence of the reward function $r_{t}$ is to convey the goal to the agent. In each round of FL, the agent selects an action $a_{t}$ by observing the state $s_{t}$ according to its policy, after which the environment responds to these actions accordingly and shifts to the new state $s_{t + 1}$ while generating a reward signal. The reward function $r_{t}$ at time $t$ is denoted by

\begin{matrix} (19) & r_{t} = \frac{1}{t} \sum_{k = 1}^{a l} t - P_{t}^{k} + \sum_{i = t - 9}^{t} ac c_{i} + e^{a l \times ac c_{t}} + \frac{1}{a l} \sum_{k = 1}^{a l} \frac{1}{1 + {variance}_{t}^{k}}, \end{matrix}

where

a l

represents the number of clients involved in the global aggregation process,

K

is the total number of clients,

P_{t}^{k}

represents the number of times the client has participated in the global aggregation progress,

ac c_{t}

is the accuracy of the global model on the test set in time

t

and

{variance}_{t}^{k} = 1 / 10 \sum_{i = t - 9}^{t} {ac c_{t}^{k} - \bar{ac c^{k}}}^{2}

represents the variance between the current test accuracy of client

k

and the average of the historical accuracy over the last 10 training rounds. The greater the value of

1 / t \sum_{k = 1}^{a l} t - P_{t}^{k}

, the less frequently the currently selected client participates, which encourages the server to select a wide variety of clients to participate in the FL aggregation process.

\sum_{i = t - 9}^{t} ac c_{i}

considers historical performance and encourages models to maintain a high level of accuracy over a long time. The global model receives an additional bonus when it has performed well in the last ten training rounds.

e^{a l} \times ac c_{t}

rewards the global model that performs well by considering accuracy and the number of participating clients. The model is rewarded more when the accuracy is higher.

1 / a l \sum_{k = 1}^{a l} 1 / 1 + {variance}_{t}^{k}

can effectively monitor the stability of the selected model’s performance. This information can be used to adjust the reward function to encourage the consistent performance of the local model across time points, thus improving the effectiveness of global aggregation. This reward function aims to incentivise the server to select more reliable clients to actively participate in the model training to improve the accuracy and stability of the global model.

The objective of the agent, i.e. the server, is to find an optimal strategy $π^{*}$ that maximises the accumulated reward $r_{t}$ . The optimal strategy $π^{*}$ is as follows: $\begin{matrix} (20) & π^{*} = \max_{a_{t}} \sum_{t = 1}^{T} γ^{t} r_{t}, \end{matrix}$ where $a_{t}$ is the action chosen at time $t$ , i.e., the weighted strategy $φ_{t}$ , T denotes the total number of global trainings, $γ$ is the discount factor and $r_{t}$ is the reward function at time $t$ .

Algorithm 1: DDPG-based security-fairness value update for FL.

Input: current state $s_{t} = φ_{t - 1}, ac c_{t - 1}, r_{avg}, m r_{avg}, P_{t} - P_{mid}$

Output: weighting strategy $A = φ \in 0, 1$

1. Initialize the global model parameters $w$ , the total number of FL iterations $T$ , experience replay memory $D$

2. Initialize the parameters of the actor network $θ^{μ}$ and the critic network $θ^{Q}$

3. Initialize the parameters of the target actor network $θ^{μ^{'}}$ and the target critic network $θ^{Q^{'}}$

4. for $t = 1$ to $T$ do

5. Initialize exploration noise $N_{t}$

6. With probability $ϵ$ select random action $a_{t}$

7. Otherwise select an action $a_{t} = μ s_{t} θ^{μ} + N_{t}$ according to the current policy

8. Execute action $a_{t}$ , then observe reward $r_{t}$ and new state $s_{t + 1}$

9. Store transition $s_{t}, a_{t}, r_{t}, s_{t + 1}$ in $D$

10. Sample a random minibatch of transition from $D$

11. Update $θ^{Q}$ by minimizing the loss function $L θ^{Q}$

12. Update $θ^{μ}$ by using the sample policy gradient: $\nabla_{θ^{μ}} J \approx E_{μ^{'}} \nabla_{a} Q {s, a θ^{Q}}_{s = s_{t}, a = μ s_{t}} \nabla_{θ^{μ}} μ s θ^{μ} s_{t}$

13. Every certain steps, update $θ^{μ^{'}}$ and $θ^{Q^{'}}$ by EMA

14. end for

The DDPG algorithm consists of four neural networks, the actor network $μ \cdot θ^{μ}$ , the critic network $Q \cdot θ^{Q}$ , the target actor network $μ^{'} \cdot θ^{μ^{'}}$ and the target critic network $Q^{'} \cdot θ^{Q^{'}}$ . The critic network estimates the $Q$ -value of the state $s_{t}$ and the action $a_{t}$ . The actor network outputs action $a_{t}$ based on state $s_{t}$ and the optimal policy $π^{*}$ . The reason for using the target actor network and the target critic network is that it is difficult to update when the update target is constantly changing. The algorithm also uses the technique of fixing the network by freezing the target network parameters. After updating for some time, the parameters are assigned to the target network. To equip the agent with the ability to explore, the Ornstein–Uhlenbeck (OU) process is used as the action noise in this article. In addition, the agent adopts the technology of experience replay, which is used to improve the stability of training.

The update process of the DDPG algorithm focuses on updating the parameters of the actor network and critic network. During the training phase, the agent samples a batch of data from the replay buffer. Suppose a piece of data is $s_{t}, a_{t}, r_{t}, s_{t + 1}, done$ , the agent uses the target actor network to compute the action $a_{t + 1}$ as follows: $\begin{matrix} (21) & a_{t + 1} = μ^{'} s_{t + 1} θ^{μ^{'}}, \end{matrix}$ where $θ^{μ^{'}}$ is the parameter of the target actor network. Then, the agent uses the target critic network to compute the target value $Q_{target}$ as follows: $\begin{matrix} (22) & Q_{target} = r_{t} + γ Q^{'} s_{t + 1}, a_{t + 1} θ^{μ^{'}} . \end{matrix}$

After that, the $Q s_{t}, a_{t} θ^{Q}$ -value is calculated by the critic network. Finally, the parameters $θ^{Q}$ of the critic network are updated by using the SGD algorithm to minimise the loss function $L θ^{Q}$ . The loss function $L θ^{Q}$ is given by $\begin{matrix} (23) & L θ^{Q} = E {Q_{target} - Q s_{t}, a_{t} θ^{Q}}^{2} . \end{matrix}$

Compared with the critic network, the actor network parameters are updated more simply. After using the actor network to compute the action $a_{t} = μ s_{t} θ^{μ}$ , the agent only needs to maximise the $Q s_{t}, a_{t} θ^{Q}$ -value by using the gradient ascent algorithm to update the parameters $θ^{μ}$ of the actor network. In practice, the gradient descent algorithm is used to minimise $- Q s_{t}, a_{t} θ^{Q}$ -value. In addition, exponential moving average (EMA) is used to update the parameters $θ^{Q^{'}}$ and $θ^{μ^{'}}$ of the target network as follows: $\begin{matrix} (24) & θ^{Q^{'}} = τ θ^{Q} + 1 - τ θ^{Q^{'}}, \\ (25) & θ^{μ^{'}} = τ θ^{μ} + 1 - τ θ^{μ^{'}}, \end{matrix}$ where $τ$ is the soft update factor and $θ^{Q}$ and $θ^{μ}$ are the model parameters for the critic and actor networks, respectively. Algorithm 1 explains in detail the DDPG-based security-fairness value update for FL.

5. Performance Evaluation

5.1. Simulation Settings

In this section, the performance as well as the reliability of the client strategy based on DDPG is evaluated through simulation. This simulation environment is Nvidia GeForce GPUs Version RTX 3060 running on Windows 11. The framework presented in this paper is developed on Python 3 and TensorFlow. The experiments were conducted on the MNIST and Fashion-MNIST datasets, both of which consists of 6000 training examples and 1000 examples in the range of 0–9, where each example is a $28 \times 28$ grey-level image. This dataset has been widely used in several FL assessments.

In this simulation experiment, the model iteratively trained on the MNIST dataset is used as the FL environment. For each local client, convolutional neural network (CNN) is used for simulation training in this paper. In each iteration of training, the learning rate is set to 0.01 and the batch size is set to 32. In this work, we consider that the FL model consists of 200 global training iterations and each client has 600 samples. In addition, the attacker’s malicious data are generated in advance by corrupting the data and the label matching of training samples. In particular, we consider both low-intensity and high-intensity data-poisoning attacks.

For the final global model of FL to take into account all reliable clients or to reduce the impact of the test set preference for a particular client, the proposed framework takes into account both the reputation value of the client (i.e., the performance of models that have been uploaded by clients) and the number of times clients has been selected to participate in the global aggregation progress. The trade-off parameter $φ$ between the two elements is chosen by the DDPG algorithm.

The DDPG algorithm used in the proposed framework involves four neural networks. The critic network evaluates and provides feedback on the consequences of the action taken by the action network. The target actor network and the critic network ensure the stability of the agent during training. The actor network consists of an input layer, two hidden layers and an output layer, where the number of units in the input layer is equal to the number of factors contained in each state. In the critic network, the number of units of the two hidden layers is equal to the number of states and actions respectively. The ReLu function is used as the activation function for the hidden layer in all networks. On the one hand, the output layer of the actor network is a Sigmoid layer, which limits the action space between 0 and 1. On the other hand, the critic output layer does not have an activation function. The specific parameters of the DDPG algorithm are shown in Table 1.

Table 1

Simulation parameters for the weighting strategy for security-fairness value based on DDPG for FL.

Parameter	Value
Replay memory size	10,000
Batch size	32
Optimizer	SGD
Activation function	ReLu
Learning rate of actor	0.001
Learning rate of critic	0.001
Discount factor $γ$	0.99
Security score to fairness score mapping (a)	100
Number of clients (K)	5, 10
Number of attackers (M)	2, 4
Number of FL iterations (T)	200

5.2. Performance Analysis

In this section, we analyse the performance of the proposed client selection strategy based on DDPG in detail. Classifying handwritten digits using the MNIST dataset is considered an FL task. The performance is evaluated by observing its completion of the FL task.

Figure 2 compares the global model performance of the client selection strategy proposed in this paper, i.e., relying on DDPG to compute the trade-off parameter $φ$ , with fixing the trade-off parameter $φ$ to 0 and 1 under low-intensity poisoning attacks. During the execution of the FL mission, the percentage of attackers is set to 40% and all performed low-intensity poisoning attacks. These attackers launch poison attacks at regular intervals. Figure 2(a) shows the experiment results with 5 clients $k$ and 2 attackers $m$ . From Figure 2(a), the performance results of the $φ$ DDPG computation and the $φ = 1$ line are quite close to each other, no matter which dataset the global model accuracy is performed on. The accuracy values on the MNIST dataset are 0.941 and 0.923, respectively, while on the FMNIST dataset, they are 0.96 and 0.944 respectively. The gap exists because when the trade-off parameter $φ$ is calculated using DDPG, the agent, i.e., servers, ensures that reliable clients participate in the aggregation phase fairly of the global model while defending against attackers. This diversifies and represents the data distribution, ultimately leading to higher global model accuracy. Whereas when the trade-off parameter $φ$ is set to 1, the server will always select the client with the highest security score to participate in the aggregation. When the trade-off parameter $φ$ is set to 1, the server will only consider the fairness of the client’s participation and ignore the security of the client. In other words, it allows attackers to compromise the global model, resulting in a model that fails to converge.

[figure(s) omitted; refer to PDF]

Next, the number of clients $k$ is increased to 10 and the number of attackers $m$ is increased to 4. The attackers continue to use low-intensity, fixed-interval poisoning attacks, and the final experimental performance is shown in Figure 2(b). In Figure 2(b), it can be observed that the difference in performance is smaller compared with the results with 5 clients $k$ and 2 attackers $m$ in the two cases where $φ$ DDPG calculation and $φ$ is fixed to 1. In these two cases, the FL global model performs with an accuracy of 0.949 and 0.938 on the MNIST dataset and 0.966 and 0.955 on the FMNIST dataset, respectively. The reason for the decrease in the gap is that the percentage increase in global model accuracy slows down as the number of clients increases, in the case where only reliable clients are selected to participate in the aggregation process after the server has identified the attacker. However, it is also demonstrated that the global model trained by the proposed client selection strategy performs more superior and stable. The reason is that the proposed strategy relies on the actor network and critic network to select the best weighting strategy.

Moreover, Figure 3 exhibits the completion of the FL tasks with high-intensity poisoning attacks by the three trade-off parameter $φ$ setting cases, i.e., $φ$ DDPG calculation, $φ$ set to 1 and $φ$ set to 0. As shown in Figures 2 and 3, with the same number of clients and attackers, i.e., k = 5, m = 2 and k = 10, m = 4, the performance results of $φ$ computed by the DDPG or fixed to 1 are essentially similar regardless of whether the attacker adopts low-intensity attack or high-intensity attack. This result indicates that the proposed scheme is effective against data-poisoning attacks, whereas $φ$ fixed to 0 performs very poorly, reflected in its trained global model instability. This is because the server selects the attacker’s uploaded model for aggregation when only considering the fairness score of the clients, which results in the global model becoming worse as a result of the increase in the intensity of the attack. In brief, since the scheme has no mechanism to defend against attacks, an increase in the intensity of attacks will result in a worse model.

[figure(s) omitted; refer to PDF]

Figure 4 compares the global model performance of different trade-off parameters $φ$ under different total numbers of clients $k$ in federated learning without malicious clients. From Figure 4, we observe that the global model performance is similar when the trade-off parameter $φ$ is calculated by DDPG and fixed to 0, regardless of the kind of dataset. They all have higher global model accuracy than when the trade-off parameter $φ$ is fixed to 1. This can be explained by the fact that the proposed client selection strategy based on DDPG considers the security of the system and the fairness of reliable client participation, which can be achieved by controlling the trade-off parameters. When there are no malicious clients in the FL system, each client will participate fairly in the federated learning process. This situation is similar to when the trade-off parameter $φ$ is fixed to 0 and naturally performs better than when the trade-off parameter $φ$ is set to 1 and certain clients-implemented global models are fixed. The difference between Figures 4(a) and 4(b) is that as the total number of clients $k$ increases, the final global model accuracy also improves, which is reasonable.

[figure(s) omitted; refer to PDF]

Since our proposed client selection strategy based on DDPG performs similarly in low-intensity and high-intensity data-poisoning attacks, we only explored the case of low-intensity data poisoning attacks in the experiment of changing the proportion of attackers. Figure 5 illustrates the effect of increasing the number of attackers $m$ on the accuracy of the FL global model when the total number of clients is fixed by using the proposed client selection strategy. It can be noticed that the accuracy of the global model of FL is decreasing slowly, as the number of attackers $m$ increases, which is to be expected. The proposed strategy results in an average percentage decrease in accuracy of approximately. Adopting the proposed strategy on the MNIST dataset results in an average decrease in accuracy of about $0.58 %$ , while on the FMNIST dataset, it results in an average decrease in accuracy of about $0.68 %$ . This demonstrates the security and robustness of the proposed strategy.

[figure(s) omitted; refer to PDF]

Figure 6 shows that the server has selected the clients to participate in the aggregation using the proposed client selection strategy based on DDPG. Figure 6(a) shows the case where the worker $k$ is set to 5 and the attacker $m$ is set to 4, and Figure 6(b) shows the case where the worker $k$ is set to 10 and the attacker $m$ is set to 4. Since our strategies perform similarly in the face of variations in the attack strengths, we continue to explore the case where the attacker performs a low-intensity data-poisoning attack only. As shown in Figure 6, when $φ$ is fixed to 0, each client fairly participates in the aggregation phase. In contrast to the trade-off parameter $φ$ set to 1, our proposed strategy allows each reliable client to participate in the aggregation process as much as possible. More specifically, the ratio of the number of times a server uses the proposed strategy to select its more reliable or preferred client to the number of other clients is about 2:1. This ratio result indicates that our proposed strategy can determine the optimal weighted strategy based on the current environment, thus fairly selecting reliable clients to participate in the training process of FL.

[figure(s) omitted; refer to PDF]

6. Conclusion

In this paper, we introduce a novel client selection strategy for FL, aiming to reduce the risk of client submission of unreliable or malicious models that lead to FL task failure. In addition, the strategies proposed can address the fairness issues arising from disparities in server preferences and client resources during global model aggregation. The core of this client selection strategy is to introduce security-fairness value to comprehensively evaluate client reliability and participation. The security-fairness value is calculated based on the current weighted strategy, security score and fairness score. The security score is computed based on the historical performance dynamics captured by the beta distribution, while the fairness score quantifies the frequency of client involvement in the aggregation process. Utilizing a weighting strategy based on the DDPG, the proposed scheme dynamically weighs these two scores to defend against malicious attacks and promotes fairness in the participation of reliable clients. The experimental results validate the effectiveness of our method and establish a new standard for secure and fair client selection in FL systems.

Funding

This work was supported by the Natural Science Foundation of Jiangxi Province of China (No. 20242BAB25066), and the National Nature Science Foundation of China (No. 61962022, 62062034 and 62172160).

Acknowledgments

This work was supported by the Natural Science Foundation of Jiangxi Province of China (No. 20242BAB25066), and the National Nature Science Foundation of China (No. 61962022, 62062034 and 62172160).

References

[1] Y. Xie, H. Wang, B. Yu, C. Zhang, "Secure Collaborative Few-Shot Learning," Knowledge-Based Systems, vol. 203,DOI: 10.1016/j.knosys.2020.106157, 2020.

[2] B. McMahan, E. Moore, D. Ramage, S. Hampson, B. A. Y. Arcas, "Communication-efficient Learning of Deep Networks From Decentralized Data," Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, vol. 54, pp. 1273-1282, 2017.

[3] P. Gupta, K. Yadav, B. B. Gupta, M. Alazab, T. R. Gadekallu, "A Novel Data Poisoning Attack in Federated Learning Based on Inverted Loss Function," Computers & Security, vol. 130,DOI: 10.1016/j.cose.2023.103270, 2023.

[4] S. Wang, Q. Li, Z. Cui, J. Hou, C. Huang, "Bandit-Based Data Poisoning Attack Against Federated Learning for Autonomous Driving Models," Expert Systems with Applications, vol. 227,DOI: 10.1016/j.eswa.2023.120295, 2023.

[5] X. Tan, W. C. Ng, W. Y. B. Lim, Z. Xiong, D. Niyato, H. Yu, "Reputation-Aware Federated Learning Client Selection Based on Stochastic Integer Programming," IEEE Transactions on Big Data,DOI: 10.1109/tbdata.2022.3191332, 2024.

[6] S. Ben Saad, B. Brik, A. Ksentini, "Toward Securing Federated Learning against Poisoning Attacks in Zero Touch B5G Networks," IEEE Transactions on Network and Service Management, vol. 20 no. 2, pp. 1612-1624, DOI: 10.1109/tnsm.2023.3278838, 2023.

[7] N. M. Jebreel, J. Domingo-Ferrer, "FL-defender: Combating Targeted Attacks in Federated Learning," Knowledge-Based Systems, vol. 260,DOI: 10.1016/j.knosys.2022.110178, 2023.

[8] X. Xiao, Z. Tang, L. Yang, Y. Song, J. Tan, K. Li, "FDSFL: Filtering Defense Strategies toward Targeted Poisoning Attacks in IIoT-Based Federated Learning Networking System," IEEE Network, vol. 37 no. 4, pp. 153-160, DOI: 10.1109/mnet.004.2200645, 2023.

[9] J. Le, D. Zhang, X. Lei, L. Jiao, K. Zeng, X. Liao, "Privacy-Preserving Federated Learning With Malicious Clients and Honest-But-Curious Servers," IEEE Transactions on Information Forensics and Security, vol. 18, pp. 4329-4344, DOI: 10.1109/tifs.2023.3295949, 2023.

[10] Z. Shi, L. Zhang, Z. Yao, "FedFAIM: A Model Performance-Based Fair Incentive Mechanism for Federated Learning," IEEE Transactions on Big Data,DOI: 10.1109/tbdata.2022.3183614, 2024.

[11] L. Gao, L. Li, Y. Chen, W. Zheng, C. Xu, M. Xu, "FIFL: A Fair Incentive Mechanism for Federated Learning," Proceedings of the 50th International Conference on Parallel Processing, 2021.

[12] T. Wan, X. Deng, W. Liao, N. Jiang, "Enhancing Fairness in Federated Learning: A Contribution-Based Differentiated Model Approach," International Journal of Intelligent Systems, vol. 2023,DOI: 10.1155/2023/6692995, 2023.

[13] H. Zhu, Y. Zhou, H. Qian, Y. Shi, X. Chen, Y. Yang, "Online Client Selection for Asynchronous Federated Learning With Fairness Consideration," IEEE Transactions on Wireless Communications, vol. 22 no. 4, pp. 2493-2506, DOI: 10.1109/twc.2022.3211998, 2023.

[14] Y. Cong, J. Qiu, K. Zhang, "Ada-FFL: Adaptive Computing Fairness Federated Learning," CAAI Transactions on Intelligence Technology, vol. 9 no. 3, pp. 573-584, DOI: 10.1049/cit2.12232, 2024.

[15] R. S. Sutton, A. G. Barto, Reinforcement Learning: An Introduction, 2018.

[16] B. Jamil, H. Ijaz, M. Shojafar, K. Munir, "IRATS: A DRL-Based Intelligent Priority and Deadline-Aware Online Resource Allocation and Task Scheduling Algorithm in a Vehicular Fog Network," Ad Hoc Networks, vol. 141,DOI: 10.1016/j.adhoc.2023.103090, 2023.

[17] Y. Lin, Z. Gao, H. Du, "Drl-Based Adaptive Sharding for Blockchain-Based Federated Learning," IEEE Transactions on Communications, vol. 71 no. 10, pp. 5992-6004, DOI: 10.1109/tcomm.2023.3288591, 2023.

[18] L. Wu, S. Guo, Z. Hong, Y. Liu, W. Xu, Y. Zhan, "Long-Term Adaptive VCG Auction Mechanism for Sustainable Federated Learning with Periodical Client Shifting," IEEE Transactions on Mobile Computing, vol. 23 no. 5, pp. 6060-6073, DOI: 10.1109/tmc.2023.3317063, 2024.

[19] T. P. Lillicrap, J. J. Hunt, A. Pritzel, "Continuous Control with Deep Reinforcement Learning," Proceedings of the International Conference on Learning Representations (ICLR), 2016.

[20] Z. Jiang, Y. Xu, H. Xu, Z. Wang, C. Qian, "Heterogeneity-Aware Federated Learning with Adaptive Client Selection and Gradient Compression," IEEE INFOCOM 2023-IEEE Conference on Computer Communications, 2023.

[21] M. Panigrahi, S. Bharti, A. Sharma, "FedDCS: A Distributed Client Selection Framework for Cross Device Federated Learning," Future Generation Computer Systems, vol. 144, pp. 24-36, DOI: 10.1016/j.future.2023.02.001, 2023.

[22] M. A. P. Putra, A. R. Putri, A. Zainudin, D.-S. Kim, J.-M. Lee, "ACS: Accuracy-Based Client Selection Mechanism for Federated Industrial IoT," Internet of Things, vol. 21,DOI: 10.1016/j.iot.2022.100657, 2023.

[23] J. Zhang, X. Li, W. Liang, P. Vijayakumar, F. Alqahtani, A. Tolba, "Two-Phase Sparsification With Secure Aggregation for Privacy-Aware Federated Learning," IEEE Internet of Things Journal, vol. 11 no. 16, pp. 27 112-127 125, DOI: 10.1109/jiot.2024.3400389, 2024.

[24] T. Chang, L. Li, M. Wu, W. Yu, X. Wang, C. Xu, "GraphCS: Graph-Based Client Selection for Heterogeneity in Federated Learning," Journal of Parallel and Distributed Computing, vol. 177, pp. 131-143, DOI: 10.1016/j.jpdc.2023.03.003, 2023.

[25] B. Chen, N. Ivanov, G. Wang, Q. Yan, "Dynamicfl: Balancing Communication Dynamics and Client Manipulation for Federated Learning," 2023 20th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON), pp. 312-320, 2023.

[26] J. Guo, H. Li, F. Huang, "ADFL: A Poisoning Attack Defense Framework for Horizontal Federated Learning," IEEE Transactions on Industrial Informatics, vol. 18 no. 10, pp. 6526-6536, DOI: 10.1109/tii.2022.3156645, 2022.

[27] R. Wang, J. Lai, Z. Zhang, X. Li, P. Vijayakumar, M. Karuppiah, "Privacy-Preserving Federated Learning for Internet of Medical Things Under Edge Computing," IEEE Journal of Biomedical and Health Informatics, vol. 27 no. 2, pp. 854-865, DOI: 10.1109/jbhi.2022.3157725, 2023.

[28] L. Zhang, J. Xu, P. Vijayakumar, P. K. Sharma, U. Ghosh, "Homomorphic Encryption-Based Privacy-Preserving Federated Learning in IoT-Enabled Healthcare System," IEEE Transactions on Network Science and Engineering, vol. 10 no. 5, pp. 2864-2880, DOI: 10.1109/tnse.2022.3185327, 2023.

[29] Y. Miao, R. Xie, X. Li, Z. Liu, K.-K. R. Choo, R. H. Deng, "Efficient and Secure Federated Learning Against Backdoor Attacks," IEEE Transactions on Dependable and Secure Computing, vol. 21 no. 5, pp. 4619-4636, DOI: 10.1109/tdsc.2024.3354736, 2024.

[30] J. Guo, Z. Liu, S. Tian, "TFL-DT: A Trust Evaluation Scheme for Federated Learning in Digital Twin for Mobile Networks," IEEE Journal on Selected Areas in Communications, vol. 41 no. 11, pp. 3548-3560, DOI: 10.1109/jsac.2023.3310094, 2023.

[31] H. Gao, N. He, T. Gao, "SVerifl: Successive Verifiable Federated Learning With Privacy-Preserving," Information Sciences, vol. 622, pp. 98-114, DOI: 10.1016/j.ins.2022.11.124, 2023.

[32] J. A. Alzubi, O. A. Alzubi, A. Singh, M. Ramachandran, "Cloud-IIoT-Based Electronic Health Record Privacy-Preserving by CNN and Blockchain-Enabled Federated Learning," IEEE Transactions on Industrial Informatics, vol. 19 no. 1, pp. 1080-1087, DOI: 10.1109/tii.2022.3189170, 2023.

[33] Y.-A. Xie, J. Kang, D. Niyato, "Securing Federated Learning: A Covert Communication-Based Approach," IEEE Network, vol. 37 no. 1, pp. 118-124, DOI: 10.1109/mnet.117.2200065, 2023.

[34] M. Abdel-Basset, N. Moustafa, H. Hawash, "Privacy-Preserved Cyberattack Detection in Industrial Edge of Things (IEoT): A Blockchain-Orchestrated Federated Learning Approach," IEEE Transactions on Industrial Informatics, vol. 18 no. 11, pp. 7920-7934, DOI: 10.1109/tii.2022.3167663, 2022.

[35] R. Zhao, Y. Wang, Z. Xue, T. Ohtsuki, B. Adebisi, G. Gui, "Semisupervised Federated-Learning-Based Intrusion Detection Method for Internet of Things," IEEE Internet of Things Journal, vol. 10 no. 10, pp. 8645-8657, DOI: 10.1109/jiot.2022.3175918, 2023.

[36] W. Huang, T. Li, D. Wang, S. Du, J. Zhang, T. Huang, "Fairness and Accuracy in Horizontal Federated Learning," Information Sciences, vol. 589, pp. 170-185, DOI: 10.1016/j.ins.2021.12.102, 2022.

[37] A. Sultana, M. M. Haque, L. Chen, F. Xu, X. Yuan, "Eiffel: Efficient and Fair Scheduling in Adaptive Federated Learning," IEEE Transactions on Parallel and Distributed Systems, vol. 33 no. 12, pp. 4282-4294, DOI: 10.1109/tpds.2022.3187365, 2022.

[38] S. M. Hosseini, M. Sikaroudi, M. Babaie, H. R. Tizhoosh, "Proportionally Fair Hospital Collaborations in Federated Learning of Histopathology Images," IEEE Transactions on Medical Imaging, vol. 42 no. 7, pp. 1982-1995, DOI: 10.1109/tmi.2023.3234450, 2023.

[39] J. Li, T. Zhu, W. Ren, K.-K. Raymond, "Improve Individual Fairness in Federated Learning via Adversarial Training," Computers & Security, vol. 132,DOI: 10.1016/j.cose.2023.103336, 2023.

[40] X. Li, S. Zhao, C. Chen, Z. Zheng, "Heterogeneity-Aware Fair Federated Learning," Information Sciences, vol. 619, pp. 968-986, DOI: 10.1016/j.ins.2022.11.031, 2023.

[41] S. K. Lo, Y. Liu, Q. Lu, "Toward Trustworthy AI: Blockchain-Based Architecture Design for Accountability and Fairness of Federated Learning Systems," IEEE Internet of Things Journal, vol. 10 no. 4, pp. 3276-3284, DOI: 10.1109/jiot.2022.3144450, 2023.

[42] R. Karmakar, G. Kaddoum, O. Akhrif, "A Novel Federated Learning-Based Smart Power and 3D Trajectory Control for Fairness Optimization in Secure UAV-Assisted MEC Services," IEEE Transactions on Mobile Computing, vol. 23 no. 5, pp. 4832-4848, DOI: 10.1109/tmc.2023.3298935, 2024.

[43] Z. Song, H. Sun, H. H. Yang, X. Wang, Y. Zhang, T. Q. S. Quek, "Reputation-Based Federated Learning for Secure Wireless Networks," IEEE Internet of Things Journal, vol. 9 no. 2, pp. 1212-1226, DOI: 10.1109/jiot.2021.3079104, 2022.

Word count: 9049

Show less

Copyright © 2024 Tao Wan et al. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0/

Abstract

Federated learning (FL) is a machine learning technique in which a large number of clients collaborate to train models without sharing private data. However, FL’s integrity is vulnerable to unreliable models; for instance, data poisoning attacks can compromise the system. In addition, system preferences and resource disparities preclude fair participation by reliable clients. To address this challenge, we propose a novel client selection strategy that introduces a security-fairness value to measure client performance in FL. The value in question is a composite metric that combines a security score and a fairness score. The former is dynamically calculated from a beta distribution reflecting past performance, while the latter considers the client’s participation frequency in the aggregation process. The weighting strategy based on the deep deterministic policy gradient (DDPG) determines these scores. Experimental results confirm that our method fairly effectively selects reliable clients and maintains the security and fairness of the FL system.

Details

Title

A Secure and Fair Client Selection Based on DDPG for Federated Learning

Author

Wan, Tao¹

; Feng, Shun¹

; Liao, Weichuan²

; Jiang, Nan¹

; Zhou, Jie¹

¹ School of Information and Software Engineering East China Jiaotong University Nanchang 330013 China
² School of Science East China Jiaotong University Nanchang 330013 China

Editor

Yu-an Tan

Publication year

2024

Publication date

2024

Publisher

John Wiley & Sons, Inc.

ISSN

08848173

e-ISSN

1098111X

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1155/2024/2314019

ProQuest document ID

3134560656

A Secure and Fair Client Selection Based on DDPG for Federated Learning

Jump to:

Full text

Abstract

Details

Suggested sources