Full text

Turn on search term navigation

INTRODUCTION

As the Internet of Things evolves, more and more data is created at the edge nodes. This data may be gathered to construct various AI models and implement them in the actual world. Due to the fact that the classic distributed learning paradigm necessitates the collection of users' personal information for centralised computing, it might result in some privacy leakage issues. Communication is a difficulty that distributed learning must always overcome, with an excessive volume of data in transmission and a large number of communications being the most frequent issues.

To address the aforementioned issues, Google presented the federated learning (FL) framework with promising growth potential in 2016. A typical FL system consists of two components: a central server and training participants. Specifically, the users participating in FL and the model of the central service are the same models, and the central server will first broadcast the model parameters globally at the start of learning. When the communication time point is reached, all or part of the users transfer the model parameters (or gradients) to the central server for aggregation, then the updated model parameters are globally broadcasted again. The FL framework provides some protection for users' sensitive data.

Recent research has revealed, however, that sensitive user data can still be compromised through inference attacks on uploaded parameters [1]. Researchers have employed a number of privacy-preserving techniques, such as multiparty secure computation [2], homomorphic encryption [3], and differential privacy [4], to prevent hostile analysts from gaining access to sensitive user information. Differential privacy approaches are utilised more frequently in FL due to their lower computational and time requirements. Therefore, the analysis and scheme design in this paper are also based on federated learning with differential privacy.

Recent publications [5, 6] have utilised differential privacy to preserve FL and parallel SGD approaches for generating FL models. However, a more practical difficulty, the communication problem, is overlooked. Local SGD is a method described in refs. [7, 8] for reducing the number of communications, and it is theoretically proven that Local SGD has the same convergence speed as parallel SGD. In addition, the study in ref. [8] is based on IID data, whereas the analysis in this paper is based on Non-IID data. In addition, the FedAVG model is the central aggregation model employed in this paper, and it is frequently used because of its superior aggregation outcomes.

The data of each FL user can be regarded as a sample of the overall data, and the assured involvement of users can be regarded as a sufficient quantity of data. However, the aggregation effect of FL can make the model parameters different from local ones, resulting in the generalisation ability of the global model that is not necessarily suitable for local users, which can reduce the idea of user participation in federated learning. Moreover, users themselves incur costs by participating in federated learning (e.g. computation costs, communication costs, data costs etc. [9]) therefore, it is a reasonable idea to compensate users for the costs. To this goal, academics have created a variety of incentive systems, including game theory [10], auction theory [11], contract theory [12], and prospect theory, although the majority of these compensate users for the losses incurred by local processing and communication uploads. However, few studies discuss the costs involved with the potential privacy leaks that may result from inference attacks. Users have varying sensitivities to privacy leakage, which can be reflected in incentives as individual preferences for privacy protection vigour. Since the level of privacy protection is inversely related to the quantity of privacy leakage ɛ, users with more privacy leakage should be compensated more.

In this paper, we use a reverse auction model (also known as a purchasing model) to achieve incentives for users. When FL is viewed as a procurement problem, that is, when users are viewed as suppliers and central servers as purchasers, reverse auctions can effectively solve the problem of selecting the winner. In the context of reverse auctions, [13] used single-attribute auctions, but numerous theoretical and experimental studies [14–16] have demonstrated that multi-attribute auctions generate greater utility for purchasers than single-attribute auctions. Multi-attribute auctions allow suppliers to place bids based on attributes other than price, allowing purchasers to obtain high-quality goods or services at reduced costs from participating suppliers.

Negotiation is another answer to the procurement issue. In previous studies, negotiations and auctions have always been studied separately, for example, comparing economic efficiency, allocating efficiency and pareto optimality. Table 1 briefly describes each negotiation and auction have their own advantages and disadvantages. However, recent research [17] shows that combining auctions and negotiations will improve social welfare. Also, this is in line with the idea of the privacy paradox, where more private information can be obtained by providing a small incentive.

TABLE 1 Advantages and disadvantages of reverse auctions and negotiations.

Auction	Advantages	Disadvantages
Auction	Greater selection of suppliers;	Not conducive to communication
Effective avoidance of human	Between supply and demand; complicated
Factors affecting transactions	Design and difficult to implement.
Negotiations	Interactivity and efficiency;	More subject to human factors.
Facilitates more knowledge
About suppliers

In summary, this paper presents the federated learning reverse auction and negotiation model that satisfies differential privacy requirements (FLRNDP).

This article contributes the following:

Theoretical analysis: This paper provides the privacy analysis, convergence analysis, and local iteration count analysis of federated learning in the context of combining differential privacy and Local SGD. The analysis provides a more intuitive demonstration of the impact of differential privacy on the performance of federated learning.
Auction model: In this paper, we propose a reverse auction model to motivate and select users considering the amount of data and the strength of privacy protection. Experiments show that the model we design has higher accuracy and model quality as well as lower loss compared to state-of-the-art algorithms.
Post-auction negotiations: We have conducted some analysis of post-auction negotiations, and the theory demonstrates that post-auction negotiations can improve social welfare, disguised as increased revenue for the central server from the central party's perspective.

The rest of the paper is organised as follows. Section 2 contains the literature review. Section 3 contains the basic theory. Section 4 contains the system framework. Section 5 contains some theoretical analysis to introduce Section 6. Section 6 contains our reverse auction. Section 7 contains post-auction negotiation model. Section 8 contains experimental analysis. Section 9 contains the conclusion.

RELATED WORK

Federated learning and differential privacy

Since the 2016 launch of the FL framework, scholars have done extensive FL-related research. McMahan et al. [18] proposed two aggregation methods for federated learning: FedAVG and FedSGD, among which FedAVG is widely used due to its simple expressions and better robustness to non-IID data. On this basis, Yu et al. [19] give a proof of convergence of federated learning in FedAVG as well as in the parallel SGD framework as a way to show why FedAVG is effective. However, since parallel SGD can impose excessive communication overhead on federated learning, Stich [8] proposed the use of Local SGD as an alternative to parallel SGD and theoretically proved that both have a convergence rate of $\mathcal{O}(1/T)$ , but their theory is only applicable under IID datasets. Li et al. [20] give a proof of Local SGD convergence in the case of non-IID dataset and shows that Local SGD still performs well in terms of convergence rate even in the non-IID case.

However, none of these articles consider the issue of privacy leakage during data transmission [1]. In 2020, Wei et al. [4] introduced differential privacy to federal learning to combat advanced inference attacks for the first time and performed a theoretical as well as numerical analysis of its performance. However, due to its lack of generalisability by assuming that all users use the same strength of privacy protection, Hu et al. [6] and Pandey et al. [10] used federated learning for personalised privacy protection. However, both of them are too lenient for privacy leakage measurements, which leads to over-measurement of privacy leakage. At the same time, they all use parallel SGD, which imposes a significant burden on communication.

In order to measure privacy leakage accurately, [21] proposed a generalised method for measuring privacy leakage caused during machine learning, MA, which measures privacy leakage during training by means of higher-order moments. Mironov [22] implemented a tighter upper bound than MA, Rényi differential privacy, by introducing Rényi entropy on the idea of Abadi [21]. Currently, RDP has generally replaced MA with more machine learning privacy preserving.

The federated learning framework in this paper uses a combination of Local SGD and RDP to both reduce the communication overhead and provide a more accurate measure of privacy leakage.

Federated learning and incentives

Inside FL, although users can get better models by participating in FL, we believe that users are rational and may be reluctant to participate in FL in the face of cost and resource consumption. Thus, the study of FL and incentive mechanism arises. Shapley value [23] is a measure of user contribution in cooperative games, and it is widely used in FL to calculate contribution, such as in refs. [24–26]. However, its excessive time complexity $\mathcal{O}\left({2}^{n}\right)$ leads to the need to choose a certain approximation in practice, Zelei et al. [27] heuristically eliminate the lower Shapley values, thus reducing the computing time, but this method is too simple and brutal. Zhenan et al. [28] can effectively estimate the Shapley value by using a compressed perception method. Due to the high time complexity of the direct calculation method, [29, 30] used the reputation value method to indirectly measure user contributions and combined the reputation value with the blockchain to effectively achieve reputation value protection.

Since users are always assumed to be rational in the incentive mechanism, it is possible for them to choose to misrepresent costs or parameters as a way to increase their revenue. Pejó et al. [31] used Nash equilibrium by modelling FL as a noncooperative game to ensure that the data submitted by users do not shift NE points. However, the actual scenario is always in a non-complete information environment and Weng et al. [32] solves the non-cooperative game in FL by using BNE. In addition to using traditional game theory, Wei et al. [33] also used contract theory to address the selection of users involved in FL.

However, none of the aforementioned articles have gone into the discussion of compensating for privacy breaches (due to differential privacy protection), Sun et al. [34] using contract theory for privacy breaches, which not only ensures that users do not misreport, but also ensures that higher gains are obtained for larger degrees of privacy breaches. However, not all users have individualised privacy protection strengths. Zhang et al. [35] used the VCG mechanism to select users with different privacy-preserving strengths, and also explored the difference between local and global models in a non-IID environment. However, both assume the same amount of user data, our framework relaxes this assumption and the privacy protection strength of users is personalised, and more importantly, we use multiple attributes. In addition, we propose post-auction negotiation and perform a theoretical analysis.

PRELIMINARIES

In this section, we introduce FL, DP, Multi-attitude Reverse Auction, and Negotiation basics.

Federated learning

Suppose there exists a set $\mathcal{K}=\left\{1,2,\text{\ldots },K\right\}$ , where $\vert \mathcal{K}\vert$ is the total number of users in the FL system. Where each user $k\in \mathcal{K}$ has his own private training dataset ${D}_{k}=\left\{\left({x}_{1}^{k},{y}_{1}^{k}\right),\left({x}_{2}^{k},{y}_{2}^{k}\right),\text{\ldots },\left({x}_{{N}_{k}}^{k},{y}_{{N}_{k}}^{k}\right)\right\}$ containing N _k samples, with ${x}_{n}^{k}$ and ${y}_{n}^{k}$ respectively denoting the feature vector and the corresponding label of the n − th training sample at worker k. In theory, the FL problem is usually described as the problem of how to minimise the expected risk, that is, minimising the following expression. 1 $\min \,E(\omega )=\sum\limits _{i=1}^{k}{p}_{k}{E}_{k}(\omega )$ where p _k is the weight of each user in the aggregation process, and this weight is determined by the amount of data, that is, ${p}_{k}=\frac{\left\vert {D}_{k}\right\vert }{{\sum }_{i=1}^{i=k}\left\vert {D}_{i}\right\vert }$ , and satisfies ∑p _k = 1, and E _k(ω) is the local expected risk of the user, that is, 2 ${E}_{k}(\omega )=\int l(\mathbf{w},x,y)d{P}_{K}(x,y),\forall k\in \mathcal{K}$ where $l(\text{\ldots })$ is the sample loss function, which depends on the model parameters w, the feature vector x, and the label y.

However, the actual situation is predominantly characterised by Non-IID user data, and this data heterogeneity causes a number of problems, such as users having different P _K(.) and having different prediction goals and result preferences [35]. This renders the exact calculation of expected risk minimisation impossible, and because the data distribution of each user is unknown, as well as for computational convenience, empirical risk minimisation is now the most common method for risk minimisation.

Empirical risk minimisation

Given that P _K(.) is unknown for each user, Equation (1) is approximated by summing and averaging the losses obtained after each mini-batch. Thus, FL seeks to solve subordinate issues in the real situations. 3 $\min \,F(\mathbf{w})=\sum\limits _{k=1}^{K}{p}_{k}{F}_{k}(\mathbf{w}),\forall k\in K$ where F _k(w) is the empirical loss function of the local user, which can be expressed as: 4 ${F}_{k}(\mathbf{w})=\frac{1}{n}\sum\limits _{i=1}^{n}l\left(\mathbf{w};{\xi }_{i}\right)$ where ξ _i is the mini-batch sample data.

To solve the problem in Equation (3), Zheng et al. [36] use the mini-batch parallel SGD method, where the user is trained locally once to communicate with the server, to achieve the desired learning accuracy. This creates a significant communication issue, which is partially resolved by the Local SGD method described in this paper. While maintaining convergence efficiency comparable to that of parallel mini-batch SGD.

Specifically, we assume that the set of communication times between the client and the server is ${\mathcal{I}}_{E}=\left\{nE\vert n=1,2,\text{\ldots },n\right\}$ , where E represents the user's local Epoch count and nE represents the nE times the user has run the Epoch at the nth round of communication, it can also be called global synchronisation steps, and all times outside of communication are Local SGD. Let ${\mathbf{w}}_{t}^{k}$ be the t − th user, and if $t+1\in {\mathcal{I}}_{E}$ , that is, this moment is the global update moment, then FedAVG will collect the model parameters of the user and aggregate them at the central server, then the FedAVG update at this point will be described as 5 ${\mathbf{v}}_{t+1}^{k}={\mathbf{w}}_{t}^{k}-{\eta }_{t}\nabla {F}_{k}\left({\mathbf{w}}_{t}^{k},{\xi }_{t}^{k}\right)$ 6 ${\mathbf{w}}_{t+1}^{k}=\left\{\begin{array}{@{}ll@{}}{\mathbf{v}}_{t+1}^{k}\quad \hfill & \,\text{if}\,t+1\notin {\mathcal{I}}_{E}\hfill \\ {\sum }_{k=1}^{N}{p}_{k}{\mathbf{v}}_{t+1}^{k}\quad \hfill & \,\text{if}\,t+1\in {\mathcal{I}}_{E}\hfill \end{array}\right.$ here, ${\mathbf{v}}_{t+1}^{k}$ is the result of ${\mathbf{w}}_{t}^{k}$ after one local SGD, which can be considered as an intermediate result, and ${\mathbf{w}}_{t+1}^{k}$ represents the final result of the model parameters at the moment t + 1. If $t+1\in {\mathcal{I}}_{E}$ , the value of ${\mathbf{w}}_{t+1}^{k}$ should be the value after the central server aggregation, and if $t+1\notin {\mathcal{I}}_{E}$ , it is equal to the intermediate result ${\mathbf{v}}_{t+1}^{k}$ .

At the same time, since the data analysed in this paper is Non-IID data, Non-IID should be measured. The measurement used in this paper is described in ref. [20]:

Quantifying the degree of non-IID (heterogeneity)

Let ${F}^{\ast }$ and ${F}_{k}^{\ast }$ be the minimum values of $F$ and ${F}_{k}$ respectively. We use the term ${\Gamma }={F}^{\ast }-{\sum }_{k=1}^{N}{p}_{k}{F}_{k}^{\ast }$ for quantifying the degree of Non-IID. If the data are IID, then Γ obviously goes to zero as the number of samples grows. If the data are Non-IID, then Γ is non-zero, and its magnitude reflects the heterogeneity of the data distribution.

Differential privacy

Differential privacy is a straightforward and effective mathematical tool that defends against advanced inference attacks by employing a unique noise addition mechanism to make the newly generated perturbed data statistically equivalent to the original dataset. In this paper's differential privacy FL model, noise is added to the gradients if $t+1\notin {\mathcal{I}}_{E}$ to ensure that the final training results satisfy differential privacy. If $t+1\in {\mathcal{I}}_{E}$ , the user uploads the model parameters to the central server for model aggregation. In this paper, we measure privacy leakage using Rényi differential privacy.

$1$

Definition

((α, ɛ)-differential privacy [22]) A randomised mechanism $f:\mathcal{D}{\mapsto}\mathcal{R}$ is said to have ɛ-Rényi differential privacy of order α, or (α, ɛ)-RDP for short, if for any adjacent $D,{D}^{\prime }\in \mathcal{D}$ it holds that ${D}_{\alpha }\left(f(D)\Vert f\left({D}^{\prime }\right)\right)\le {\epsilon}.$

RDP is a relaxed version of classic (ɛ, 0)-differential privacy. The gap between two different distributions is measured by using the Rényi divergence. When α = 1, is the Kullback–Leibler divergence, that is, ${D}_{1}(P\Vert Q)={E}_{x\sim P}\log \frac{P(x)}{Q(x)}$ .

To ensure that the process of machine learning satisfies differential privacy in general, the usual way used is to add Gaussian noise to the gradient, that is, $\tilde{\nabla F}=\nabla F+\mathbf{n}$ , where n ∼ N (0, σ ² I _d). According to Proposition 7 in ref. [22], the adjacent dataset with Gaussian noise added is measured using the Rayleigh divergence, resulting in $\alpha {\mu }^{2}/\left(2{\sigma }^{2}\right)$ , where μ is the sensitivity, It is calculated as $\mu =\underset{\xi ,{\xi }^{\prime }}{\max \,}\Vert f(\xi )-f\left({\xi }^{\prime }\right){\Vert }_{2}$ , where f (.) is the query function.

Reverse auction

Unlike single-source reverse auctions, FL should be viewed as multi-source, that is, multiple users need to be identified to participate in the training task of FL. Thus the task can be transformed into a winner determination problem. For price submission, the classical sealed submission is used and all users are assumed to be rational. Also, the auctions are designed to satisfy the following definitions to ensure that users are willing to participate in the auction and that the data submitted is real.

$2$

Definition

(Budget Feasibility [BF]) A Multi-Attribute Reverse Auction is budget feasible, if and only if the sum of payments received by each user does not exceed the total budget B, that is, $\sum\limits _{k\in \mathcal{K}}{P}_{k}\le B$

$3$

Definition

(Individual Rationality (IR)) In a reverse auction, the user receives a payment that should be non-negative after subtracting the cost, that is, ${P}_{k}-{C}_{k}\ge 0,(1\le k\le K)$

In order to ensure that the information submitted by the user is truthful, we need to introduce the Myerson Lemma.

$1$

lemma

(Myerson Lemma [13]): If the auction is truthful, the following two conditions need to be satisfied:

Selection rules are monotonous: If a candidate wins by bid price C, then it can also win when he bids C′ < C.
The payment to participants is the critical value: If the candidate's bid price is greater than this critical value, it is unlikely that it will win the bid.

Negotiation

In this paper, we convert the negotiation model to an ultimatum game, in part because of its strategic simplicity, using the Rubinstein bargaining model.

$4$

Definition

(purchaser and Supplier Fractions of Supply Chain Surplus [37]:) Suppose that player 1 and player 2 in the bargaining game have fixed discount factors δ ₁, δ ₂ and complete information is available to both purchasers and suppliers, then there is a unique Nash equilibrium. In equilibrium, the player 1 to offer a negotiated price and quality pair will gain a fraction of the supply chain surplus of $\left(1-{\delta }_{2}\right)/\left(1-{\delta }_{1}{\delta }_{2}\right)$ . The player 2 to make an offer will gain a fraction of the surplus of $\left({\delta }_{2}-{\delta }_{1}{\delta }_{2}\right)/\left(1-{\delta }_{1}{\delta }_{2}\right)$ .

SYSTEM FRAMEWORK

The FL system analysed in this paper consists of a trustworthy but inquisitive central server, which wants to collect user parameters in order to create a global model, and a set of $k\in \mathcal{K}$ users $(K=\vert \mathcal{K}\vert )$ . Assume that each worker $k\in \mathcal{K}$ possesses a private training dataset D _k consisting of N _k data samples, with each data sample containing the feature vector and its corresponding label. The server's objective is to recruit a specific number of users for FL training (where successful participants must be selected through reverse auctions to ensure high accuracy of each training) without requiring their private information. The FLRNDP framework is depicted in Figure 1. The following describes the specific workflow for each round of communication. The training concludes when a predetermined number of communication rounds or a global model has converged to a particular level.

The central server will send the parameter w ⁰ to each user who is eligible to participate in the training.
The central server provides the auctioneer with its own budget B and evaluation function S(.).
Users submit their cost C _i and non-economic attributes Q _i to the auctioneer using a seal.
The auctioneer uses a reverse auction to determine the user k who participated in the FL and the corresponding payment P _k.
The central server delegates multiple agents to negotiate with the users after identifying those who will participate in the training.
The user performs local training using the re-agreed ${Q}_{k}^{\prime }$ and uploads the training parameters ${\mathbf{w}}_{t+1}^{k}$ to the central server at the time of communication following the negotiation.
The central server aggregates the parameters of user uploads as w _t+1.
The new parameters are distributed to all users by the central server.

[IMAGE OMITTED. SEE PDF]

THEORETICAL ANALYSIS: PRIVACY AND CONVERGENCE ANALYSIS

In this section, we will analyse the convergence of FL after adding differential privacy. We will show the convergence when the data is Non-IID. When the data is IID, we only need to set Γ = 0.

Privacy analysis

In this part, we leverage the notion of RDP to quantify each user's volume of privacy leakage. To ensure that the training results are all satisfy the differential privacy protection, the current practice is to add Gaussian noise to the gradient. The form is 7 $\tilde{\nabla F}\left(\mathbf{w};{\xi }_{k}\right)=\nabla F\left(\mathbf{w};{\xi }_{k}\right)+{\mathbf{n}}_{k}$ where w ∈ R ^d is the model parameter at the nearest moment, ξ _k is the mini-batch size of D _k, and ∇F is the stochastic gradient calculated under the sample size of mini-batch, ${\mathbf{n}}_{k}\sim N\left(0,{\sigma }_{k}^{2}{I}_{d}\right)$ is the added Gaussian noise. After training with mini-batch size samples, the local parameters are represented along the gradient descent as 8 ${\mathbf{w}}_{t+1}^{k}={\mathbf{w}}_{t}^{k}-{\eta }_{t}\tilde{\nabla F}\left({\mathbf{w}}_{\boldsymbol{t}}^{\boldsymbol{k}};{\xi }_{k}\right)$

In this paper, we assume that the amount of privacy leakage is unique to each user, whereas the amount of Gaussian noise added per round is the same for all users. Since the essence of differential privacy is the privacy leakage brought by the large size of the measurement noise, a larger σ will make the Gaussian distribution tend to be uniformly distributed, making it more difficult to distinguish the noise-added distribution from the original distribution, thereby reducing the privacy leakage ɛ.

Combining Proposition 7 from ref. [22] with our description of the amount of privacy leakage, we provide a theorem for the additional noise for a given amount of privacy leakage.

$1$

Theorem

In each round of FL, user k perturbs the gradient using Gaussian noise to satisfy Rényi differential privacy with ${\sigma }_{k}^{2}\ge \alpha \frac{\mathrm{c}}{\left\Vert {\xi }_{k}\right\Vert {\varepsilon }_{\mathrm{k}}}$ , and satisfies $\left(\alpha ,\alpha \frac{\mathrm{c}}{\left\Vert {\xi }_{k}\right\Vert {\sigma }_{k}^{2}}\right)-\mathit{R}\mathit{D}\mathit{P}$ .

Where ɛ _k is the intensity of privacy protection for each user, and c is the gradient clipping threshold, we write it as such to distinguish it from the user cost of the auction phase. The proof for Theorem 1 is included in Appendix A.

From the combination of Equation (7) as well as Theorem 1, a smaller privacy leak results in a larger gradient perturbation effect, where n _k is a random variable, that is, ${\mathbf{n}}_{k}\sim N\left(0,{\sigma }_{k}^{2}{I}_{d}\right)$ . Notice that the aggregation model used for our FL is FedAVG, that is, ${\mathbf{w}}_{t+1}={\sum }_{k=1}^{K}\frac{\vert {D}_{k}\vert }{{\sum }_{k=1}^{K}\vert {D}_{k}\vert }{\mathbf{w}}_{t+1}^{k}$ , which is converted into gradient descent form as 9 ${\mathbf{w}}_{t+1}=\sum\limits _{k=1}^{K}\frac{\vert {D}_{k}\vert }{{\sum }_{k=1}^{K}\vert {D}_{k}\vert }\left.\left({\mathbf{w}}_{t}^{k}-{\eta }_{t}\nabla F\left(\mathbf{w};{\xi }_{k}\right)+{\mathbf{n}}_{k}^{t}\right)\right)$

By abbreviating $\frac{\vert {D}_{k}\vert }{{\sum }_{k=1}^{K}\vert {D}_{k}\vert }$ to p _k, the aggregated noise is 10 ${\bar{\mathbf{n}}}^{t}={p}_{k}{\mathbf{n}}_{k}^{t}\sim N\left(0,{p}_{k}^{2}{\sigma }_{k}^{2}{I}_{d}\right)$

Combining Equations (9) and (10), it is clear that excessive noise will cause the model parameters to deviate significantly from their initial values. As the training continues and the model gradually converges, the excessive noise will cause the model to deviate too far from the direction of convergence, resulting in a reduction in accuracy.

Convergence analysis

This subsection will provide convergence proofs for the differential privacy federation learning framework presented in this paper when the data is non-IID. And convergence is demonstrated when all devices participate, that is, when $\vert \mathcal{K}\vert$ devices are involved.

First, we make the following assumptions about loss function F ₁, F ₂, …, F _N for each user:

$1$

Assumption

F ₁, F ₂, …, F _N are all L-smooth: for all y and x, ${F}_{k}(y)\le {F}_{k}(x)+{(y-x)}^{T}\nabla {F}_{k}(x)+\frac{L}{2}\Vert y-x{\Vert }_{2}^{2}$ .

$2$

Assumption

F ₁, F ₂, …, F _N are all μ -strongly convex: for all y and x, ${F}_{k}(y)\ge {F}_{k}(x)+{(y-x)}^{T}\nabla {F}_{k}(x)+\frac{\mu }{2}\Vert y-x{\Vert }_{2}^{2}$ .

The following two assumptions have been made by the works [19].

$3$

Assumption

let ${\xi }_{k}^{t}$ be sampled from the t − th device's local data uniformly at random. The variance of stochastic gradients in each device is bound: $E{\left\Vert \nabla {F}_{k}\left({\mathbf{w}}_{t}^{k},{\xi }_{t}^{k}\right)-\nabla {F}_{k}\left({\mathbf{w}}_{t}^{k}\right)\right\Vert }^{2}\le {\sigma }_{k}^{2}$ , for $k=1,\text{\ldots },\vert \mathcal{K}\vert$ .

$4$

Assumption

The expected squared norm of stochastic gradients is uniformly bounded, that is, $E{\left\Vert \nabla {F}_{k}\left({w}_{t}^{k},{\xi }_{t}^{k}\right)\right\Vert }^{2}\le {G}^{2}$ , for all $k=1,\text{\ldots },\vert \mathcal{K}\vert$ and t = 0, …, T − 1.

Before proving convergence, we need some additional descriptions as well as lemmas. Since noise is added to the gradient during training, therefore, we define ${\mathbf{g}}_{t}={\sum }_{k=1}^{N}{p}_{k}\left(\nabla {F}_{k}\left({\mathbf{w}}_{t}^{k},{\xi }_{t}^{k}\right)+{\mathbf{n}}^{k}\right)$ and ${\overline{\mathbf{g}}}_{t}={\sum }_{k=1}^{N}{p}_{k}\left(\nabla {F}_{k}\left({\mathbf{w}}_{t}^{k}\right)+{\overline{\mathbf{n}}}^{k}\right)$ , where ${\overline{\mathbf{g}}}_{t}=E{\mathbf{g}}_{t}$ . In the meantime, we make ${\overline{\mathbf{v}}}_{t}={\sum }_{k=1}^{K}{p}_{k}{\mathbf{v}}_{t}^{k}$ and ${\overline{\mathbf{w}}}_{t}={\sum }_{k=1}^{N}{p}_{k}{\mathbf{w}}_{t}^{k}$ , where ${\overline{\mathbf{v}}}_{t+1}={\overline{\mathbf{w}}}_{t}-{\eta }_{t}{\mathbf{g}}_{t}$ .

$1$

lemma

(Bound of one step SGD). If ${\eta }_{t}\le \frac{1}{6L}$ we have 11 $\begin{align*}\hfill \mathbb{E}{\left\Vert {\overline{\mathbf{v}}}_{t+1}-{\mathbf{w}}^{\star }\right\Vert }^{2}\le & \left(1-\mu {\eta }_{t}\right)\mathbb{E}{\left\Vert {\overline{\mathbf{w}}}_{t}-{\mathbf{w}}^{\star }\right\Vert }^{2}+\left(2-\frac{2}{3}{\eta }_{t}\right)\sum\limits _{k=1}^{N}{p}_{k}{\left\Vert {\mathbf{w}}_{t}^{k}-{\overline{\mathbf{w}}}_{t}\right\Vert }^{2}\hfill \\ \hfill & +2{\eta }_{t}\left({\eta }_{t}\sum\limits _{k=1}^{K}{p}_{k}{\left\Vert {\mathbf{n}}_{t}^{k}\right\Vert }^{2}\right)+2{\eta }_{t}^{2}L{\Gamma }+{\eta }_{t}^{2}\mathbb{E}{\left\Vert {\mathbf{g}}_{t}-{\overline{\mathbf{g}}}_{t}\right\Vert }^{2}\hfill \end{align*}$

It is clear from Lemma 1 that the inclusion of differential privacy increases the upper bound.

$2$

lemma

(Bounding the variance). The variance of the aggregated gradient obtained by random sampling of the data and full sampling are as follows: 12 $\mathbb{E}{\left\Vert {\mathbf{g}}_{t}-{\overline{\mathbf{g}}}_{t}\right\Vert }^{2}\le \sum\limits _{k=1}^{K}{p}_{k}^{2}{\sigma }_{k}^{2}$

$3$

lemma

(Bounding the divergence of $\left\{{\mathbf{w}}_{t}^{k}\right\}$ ) In the case where η _t is non-increasing and η _t ≤ 2η _t + E for all t ≥ 0.

It follows that 13 $E\sum\limits _{k=1}^{N}{p}_{k}{\left\Vert {\overline{\mathbf{w}}}_{t}-{\mathbf{w}}_{t}^{k}\right\Vert }^{2}\le 8{\eta }_{t}^{2}{(E-1)}^{2}\left({G}^{2}+{\left\Vert {\mathbf{n}}_{t}^{k}\right\Vert }^{2}\right)$

It can be seen from lemma 3, if E is large, the difference between the local model parameters and the aggregated model parameters will become excessively large, resulting in a poorer fit of the model parameters to the local data after aggregation. But E is small, the aggregated model parameter $\overline{\mathbf{w}}$ will be skewed like a user with a larger amount of data |D _k|, eventually forming a local optimal solution. This is illogical in the case of Non-IID data, as F ₁, …, F _N may be vastly distinct from F.

With lemma 1, 2, and 3, we can describe the convergence of the differential privacy FL framework used in this paper.

$2$

Theorem

Let Assumptions 1 to 4 hold and choose $\kappa =\frac{L}{\mu }$ , γ = max{8 κ, E} and the learning rate ${\eta }_{t}=\frac{2}{\mu (\gamma +t)}$ . The expected upper bound of the difference between the loss after T-wheel and the optimal loss is: 14 $\mathbb{E}\left[F\left({\mathbf{w}}_{T}\right)\right]-{F}^{\ast }\le \frac{2\kappa }{\gamma +T}\left(\frac{B}{\mu }+2L{\left\Vert {\mathbf{w}}_{0}-{\mathbf{w}}^{\ast }\right\Vert }^{2}\right)$ where 15 $B=8{(E-1)}^{2}\left({G}^{2}+{\left\Vert {\mathbf{n}}_{t}^{k}\right\Vert }^{2}\right)+\sum\limits _{k=1}^{K}{p}_{k}^{2}{\sigma }_{k}^{2}+2L{\Gamma }+2\left(\sum\limits _{k=1}^{K}{p}_{k}{\left\Vert {\mathbf{n}}_{t}^{k}\right\Vert }^{2}\right)$

From Theorem 2, first of all, we can see that the convergence rate of Local SGD after adding differential privacy is the same as that of parallel SGD, both being $\mathcal{O}\left(\frac{1}{T}\right)$ . Secondly, since each user adds different noise to the gradient during training, the aggregated noise ${\overline{\mathbf{n}}}_{t}\sim N\left(0,{\sum }_{k=1}^{K}{p}_{k}^{2}{\sigma }_{k}^{2}{I}_{d}\right)$ . From the analysis, it can be seen that reducing noise will effectively increase the convergence rate as well as the accuracy.

Since the exact size of E cannot be given, we try to give the relationship between T and E to illustrate the theoretical value of E. Assume that FL stops when $\mathbb{E}\left[F\left({\mathbf{w}}_{T}\right)\right]-{F}^{\ast }\le \varepsilon \ast$ , let T _ɛ* be the moment when convergence is satisfied. Then 16 $\frac{{T}_{\varepsilon \ast }}{E}\propto \left(1+\frac{1}{K}\right)E{G}^{2}+\frac{{\sum }_{k=1}^{N}{p}_{k}^{2}{\sigma }_{k}^{2}+L{\Gamma }+\left({\sum }_{k=1}^{K}{p}_{k}{\left\Vert {\mathbf{n}}_{t}^{k}\right\Vert }^{2}-\frac{{p}_{k}G}{\mu {\eta }_{t}}\left\Vert {\mathbf{n}}_{t}^{k}\right\Vert \right)}{E}$ it is also clear from the above equation that the determination of the optimal value of E is a difficult matter as described in ref. [36].

In the next section, we will improve the model accuracy by reducing the aggregation noise with the incentive model we designed. At the same time, users will be willing to participate in the training of the FL when they get the benefit, so as to ensure the participation.

MULTI-ATTRIBUTE REVERSE AUCTION DESIGN

In this section, we describe in detail the FLRNDP framework's multi-attribute reverse auction model. First, we will describe the design goals, followed by a formal description of the user selection process. As this section represents the auction portion of the overall model, we abbreviate it as FLRNDP-A.

Design goals

According to the summary in the previous section, in order to encourage users to participate in learning, it is necessary to select a privacy budget of ɛ and a data volume of |D| in order to improve the accuracy of the final training. We view the FL issue as a procurement issue, for which reverse auctions are currently more prevalent.

To minimise the variance of the distribution of the random variable $\overline{\mathbf{n}}$ while selecting the maximum number of users. We can transform the procurement problem into an optimisation problem and solve it using mathematical programming, as outlined below, 17a $\underset{{\mathcal{K}}^{\prime }\subseteq \mathcal{K}}{\min \,}\sum\limits _{k=1}^{\vert {\mathcal{K}}^{\prime }\vert }\frac{\vert {D}_{k}\vert }{{\sum }_{i=1}^{{\mathcal{K}}^{\prime }}\vert {D}_{k}\vert }{\sigma }_{k}^{2}$ 17b $s.t\sum\limits _{k\in \vert {\mathcal{K}}^{\prime }\vert }\left({P}_{k}\right){x}_{k}\le B$ 17c $\left({P}_{k}-{C}_{k}\right){x}_{k}\ge 0,\forall k$ 17d ${x}_{k}\in \left\{0,1\right\}$

Of which Equation (17b) is to satisfy Definition 2, where to select and compensate users who participate in federal learning on a limited budget Equation (17c) is to satisfy Definition 3, and Equation (17d) is to guarantee the assignment. When x _k = 1, this user is selected, and vice versa, it is not selected. After we use this function to solve the optimisation problem, we can find that maybe more than one minimum value point can reach the minimum value: 18 $\mathcal{S}=argmi{n}_{{\mathcal{K}}^{\prime }\subseteq \mathcal{K}}\sum\limits _{k=1}^{\vert {\mathcal{K}}^{\prime }\vert }\frac{\vert {D}_{k}\vert }{{\sum }_{i=1}^{{\mathcal{K}}^{\prime }}\vert {D}_{k}\vert }{\sigma }_{k}^{2}$

In this function, $\mathcal{S}$ is a set when there are more than one minimum value point and a minimum value point when there is only one point. As this function indicate, we can get the set of minimum value point which consist of ${\mathcal{K}}^{\prime }s$ that satisfy the object function to reach the minimum value.

However, since FL still requires a large amount of data to prevent overfitting, we need to maximise the cardinality of ${\mathcal{K}}^{\prime }$ , so we use the next minimum problem to get the optimisation 19 $mi{n}_{{\mathcal{K}}^{\prime }\subseteq \mathcal{S}}{\Delta }K=\Vert {\mathcal{K}}^{\prime }\vert -\vert \mathcal{K}\Vert$

But so far, this programming expression is still not concise enough. First, the expression of ${\sigma }_{k}^{2}$ is ${\sigma }_{k}^{2}=\frac{\alpha {\mu }_{k}^{2}}{2{\varepsilon }_{k}}$ . It can be seen that ${\sigma }^{2}\propto \frac{1}{\varepsilon }$ and ɛ is the most straightforward to obtain. Secondly, it can be seen from Equation (17a) that ${\sum }_{i=1}^{K}\left\vert {D}_{k}\right\vert$ is a fixed value for the selected user in the case of user selection, and the magnitude of the user's share in the overall aggregated noise is only determined by the amount of the user's own data. So, we can abbreviate $\frac{\left\vert {D}_{k}\right\vert }{{\sum }_{i=1}^{K}\left\vert {D}_{k}\right\vert }$ as $\left\vert {D}_{k}\right\vert$ . Therefore, $\frac{\left\vert {D}_{k}\right\vert }{{\varepsilon }_{k}}\propto \frac{\left\vert {D}_{k}\right\vert }{{\sum }_{i=1}^{K}\left\vert {D}_{k}\right\vert }{\sigma }_{k}^{2}$ .

After the above simplification, the expressions for aggregation noise of Equation (17a) can be transformed to 20 $\underset{{\mathcal{K}}^{\prime }\subseteq \mathcal{K}}{\min \,}\sum\limits _{k=1}^{\vert {\mathcal{K}}^{\prime }\vert }\frac{\left\vert {D}_{k}\right\vert }{{\varepsilon }_{k}}$

In this subsection, we have described the goals by going through the mathematical language, but the solution to this programming is difficult to obtain, and we will make some relaxations to the original problem in the next subsection and make some changes in order to satisfy the truthfulness.

Multi-attribute reverse auction model

In the previous section, we described design goals, but this is only the ideal state. Since our auction is based on FL, we also need to consider the amount of data, and the only way to increase the amount of data is to increase the number of users. Therefore, it is a contradictory problem and we cannot get an optimal solution. And to solve this problem, we put Equation (19) into the original problem as a penalty term, that is, 21 $\underset{{\mathcal{K}}^{\prime }\subseteq \mathcal{K}}{\min \,}\sum\limits _{k=1}^{\vert {\mathcal{K}}^{\prime }\vert }\frac{\left\vert {D}_{k}\right\vert }{{\varepsilon }_{k}}+\lambda \Vert {\mathcal{K}}^{\prime }\vert -\vert \mathcal{K}\Vert$ where λ is the hyperparameter. Now, our optimisation is much simpler. But this is not enough, because such an expression does not guarantee that the auction is authentic and does not determine the payment to each selected user.

Think of the central server as the purchaser and the users involved in the learning as the suppliers. He faces K bidders, expressed as a set $\mathcal{K}=\left\{1,2,\text{\ldots },K\right\}$ . At the beginning of the auction, the purchaser will specify the attributes to be submitted by the bidder, which include cost C, attribute q _i (i = 1, 2, …, m). The purchaser will convert the attributes into a scoring function S(.): R ^m → R. From the analysis in the previous subsection, our reverse auction model has two attributes, so the supplier submits the attribute $\left({C}_{k},{\varepsilon }_{k},\left\vert {D}_{k}\right\vert \right)$ they need to submit to the purchaser by submitting a sealed bid. Let $\mathcal{A}=\left\{\left({C}_{1},{\varepsilon }_{1},\left\vert {D}_{1}\right\vert \right),\left({C}_{2},{\varepsilon }_{2},\left\vert {D}_{2}\right\vert \right),\text{\ldots },\left({C}_{K},{\varepsilon }_{K},\left\vert {D}_{K}\right\vert \right)\right\}$ , denoting the set of all supplier submitted attribute tuples received by the purchaser.

For the winner determination problem (WDP) in multi-attribute auctions, by converting multiple attributes into weighted product (WP) [38]. Therefore our first idea is to treat Equation (18) as the evaluation function needed for the WDP solution, that is, 22 ${S}_{k}\left({\varepsilon }_{k},\left\vert {D}_{k}\right\vert \right)=\frac{\left\vert {D}_{k}\right\vert }{{\varepsilon }_{k}}$

However, this evaluation function design solution is problematic. If there is a user k whose |D _k| and ɛ _k are greater, it is not guaranteed to be chosen because S(.) is not monotonic. In addition, since both data size |D| and privacy leakage ɛ affect training accuracy, it is a less correct idea to simply consider the ratio size, and it is a better solution to adjust the weight of data size and privacy leakage and to achieve monotonicity of S(.) by using WP. Following is an explanation of how to construct the evaluation function S(.).

First, the tuple of attributes uploaded by the user is constructed as a matrix, that is, 23 $\left(\begin{array}{@{}ccc@{}}\hfill {C}_{1}\hfill & \hfill {\varepsilon }_{1}\hfill & \hfill {D}_{1}\hfill \\ \hfill {C}_{2}\hfill & \hfill {\varepsilon }_{2}\hfill & \hfill {D}_{2}\hfill \\ \hfill {\vdots}\hfill & \hfill {\vdots}\hfill & \hfill {\vdots}\hfill \\ \hfill {C}_{K}\hfill & \hfill {\varepsilon }_{K}\hfill & \hfill {D}_{K}\hfill \end{array}\right)$ to go about minimising the likelihood of high |D| small ɛ being selected, we transformed the user's non-payment attributes as follows, let ${C}_{k}^{\ast }=\frac{{C}_{k}}{{C}_{\max }}$ , ${\varepsilon }_{k}^{\ast }={e}^{{\varepsilon }_{k}-{\varepsilon }_{\max }}$ and ${D}_{k}^{\ast }=\mathrm{ln}\frac{\vert D{\vert }_{\max }}{\left\vert {D}_{k}\right\vert }$ , where C _max, ɛ _max and ∣D∣_max is the global maximum value of the corresponding property. Then, Equation (20) becomes the following matrix, 24 $\left(\begin{array}{@{}ccc@{}}\hfill {C}_{1}^{\ast }\hfill & \hfill {\varepsilon }_{1}^{\ast }\hfill & \hfill {D}_{1}^{\ast }\hfill \\ \hfill {C}_{2}^{\ast }\hfill & \hfill {\varepsilon }_{2}^{\ast }\hfill & \hfill {D}_{2}^{\ast }\hfill \\ \hfill {\vdots}\hfill & \hfill {\vdots}\hfill & \hfill {\vdots}\hfill \\ \hfill {C}_{K}^{\ast }\hfill & \hfill {\varepsilon }_{K}^{\ast }\hfill & \hfill {D}_{K}^{\ast }\cdot \hfill \end{array}\right)$ also, we express the evaluation function as 25 $S\left({\varepsilon }_{k}^{\ast },{\left\vert {D}_{k}\right\vert }^{\ast }\right)={\left({\varepsilon }_{k}^{\ast }\right)}^{\alpha }{\left({\left\vert {D}_{k}\right\vert }^{\ast }\right)}^{\beta }$ where α + β = 1.

The changes in the aforementioned attributes and the determination of the scoring function are primarily due to the fact that |D| is significantly greater than ɛ, causing the determinants of the evaluation function to be excessively skewed towards |D|. Also, α and β are weight values that indicate which of ɛ and |D|‘s attributes is somewhat more significant in FL. The optimal amounts of α and β will be determined through experimentation.

From the above description, the purpose of our reverse auction is to obtain the maximum ${\sum }_{k=1}^{{\mathcal{K}}^{\prime }}{S}_{k}$ with the minimum payment. Therefore, we can convert Equation (21) into 26 $\underset{{\mathcal{K}}^{\prime }\subseteq \mathcal{K}}{\min \,}\sum\limits _{k=1}^{\vert {\mathcal{K}}^{\prime }\vert }\frac{{P}_{k}}{{S}_{k}}+\lambda \Vert {\mathcal{K}}^{\prime }\vert -\vert \mathcal{K}\Vert$ Combining Equation (17) with Equation (26) yields 27a $\underset{{\mathcal{K}}^{\prime }\subseteq \mathcal{K}}{\min \,}\sum\limits _{k=1}^{\vert {\mathcal{K}}^{\prime }\vert }\frac{{P}_{k}}{{S}_{k}}+\lambda \Vert {\mathcal{K}}^{\prime }\vert -\vert \mathcal{K}\Vert$ 27b $\,\text{s.t.}\,\quad \sum\limits _{k\in {\mathcal{K}}^{\prime }}\left({P}_{k}\right){x}_{k}\le B$ 27c $\left({P}_{k}-{C}_{k}\right){x}_{k}\ge 0,\forall k\in \vert {\mathcal{K}}^{\prime }\vert$ 27d ${x}_{k}\in \left\{0,1\right\}$

however, up to now, we have not determined the value of P, but according to the constraint P _k − C _k ≥ 0, P must be at least greater than the cost C, so we can replace Equation (27a) to 28 $\underset{{\mathcal{K}}^{\prime }\subseteq \mathcal{K}}{\min \,}\sum\limits _{k=1}^{\vert {\mathcal{K}}^{\prime }\vert }\frac{{C}_{k}^{\ast }}{{S}_{k}}+\lambda \Vert {\mathcal{K}}^{\prime }\vert -\vert \mathcal{K}\Vert .$

The reason for using ${C}_{k}^{\ast }$ here is to prevent different cost functions from having an impact on the follow-up. Now, our optimisation becomes both simple and feasible. Let ${\rho }_{k}^{\prime }=\frac{{C}_{k}^{\ast }}{{S}_{k}}$ , we call ρ′ the bid price per unit of mass, sorting ${\rho }_{k}^{\prime }$ incrementally to get 29 ${\rho }_{1}^{\prime }\le {\rho }_{2}^{\prime }\le {\cdots}\le {\rho }_{\vert \mathcal{K}\vert }^{\prime },$

After sorting, we will select ${\mathcal{K}}^{\prime }$ candidates from $\mathcal{K}\left({\mathcal{K}}^{\prime }\le \mathcal{K}\right)$ users to participate in the FL, where ${\mathcal{K}}^{\prime }$ satisfies, 30 ${\mathcal{K}}^{\prime }=\underset{{\mathcal{K}}^{\prime }}{\mathrm{arg}\,\max \,}\left\{{\rho }_{\vert {\mathcal{K}}^{\prime }\vert +1}^{\prime }{C}_{\mathcal{K}+1}\sum\limits _{k=1}^{{\mathcal{K}}^{\prime }}{S}_{k}\le B\right\}$

Algorithm

FLRNDP-A model algorithm

Input: User candidate set K , Budget B , User attributes tuple set $\mathcal{A}$ Output: Winners K ′, Payment P 1: $P{\leftarrow}\left\{{p}_{1},{p}_{2},\text{\ldots },{p}_{K}\right\}$ , K ′ ← ϕ , $\rho {\leftarrow}\left\{{\rho }_{1},{\rho }_{2},\text{\ldots },{\rho }_{K}\right\}$ , j ← 1 , ${\varepsilon }_{\max }{\leftarrow}\max \left({\varepsilon }_{k}\in \mathcal{A}\right)$ , $\vert D{\vert }_{\max }{\leftarrow}\max \left(\vert D{\vert }_{k}\in \mathcal{A}\right)$ , $S{\leftarrow}\left\{{S}_{1},{S}_{2},\text{\ldots },{S}_{K}\right\}$ . 2: for $\left({\varepsilon }_{k},\vert D{\vert }_{k}\right)$ in $\mathcal{A}$ do 3: ${\varepsilon }_{k}^{\ast }={e}^{{\varepsilon }_{k}-{\varepsilon }_{\max }}$ 4: $\vert D{\vert }_{k}^{\ast }={\mathrm{ln}}^{\tfrac{\vert D\vert \max \,}{\left\vert {D}_{k}\right\vert }}$ 5: ${S}_{k}={\left({\varepsilon }_{k}^{\ast }\right)}^{\alpha }{\left(\vert D{\vert }_{k}^{\ast }\right)}^{\beta }$ 6: end for 7: User selection: 8: for C _k in $\mathcal{A},{S}_{k}\in S$ do 9: ${C}_{k}^{\ast }=\frac{{C}_{k}}{{C}_{\max }}$ 10: ${\rho }_{k}=\frac{{C}_{k}}{\left\vert {S}_{k}\right\vert }$ 11: end for 12: sort ρ , s.t. ρ _k
−1 ≤ ρ _k 13: while ${\rho }_{j}\left({S}_{j}+{\sum }_{i\in {K}^{\prime }}\right)\le B$ do 14: K ′ ← K ′⋃ j 15: j ← j + 1 16: end while 17: Payment calculation: 18: for each k ∈ K do 19: if k ∈ K ′ then 20: p _k = ρ _j C _j S _k 21: else 22: p _k = 0 23: end if 24: end forreturn K ′, P

Therefore, the final selected supplier receives payment of 31 ${P}_{k}={\rho }_{{K}^{\prime }+1}^{\prime }{C}_{{K}^{\prime }+1}{S}_{k},\quad 1\le k\le {K}^{\prime }.$

After giving the user selection methods described above, comparing Equations (27a) and (30) to see that there are cases where the number of users selected exceeds the optimal value, for which we still recommend selecting more users as a way to reduce the possibility of underfitting due to sample imbalance.

After giving a full description of the auction, we'll use a few theorems to show that our method for choosing the winner is IR, CR, and truthful.

$3$

Theorem

FLRNDP-A is individually rational.

$4$

Theorem

FLRNDP-A has budget feasibility.

$5$

Theorem

FLRNDP-A is truthful.

The proof procedure of the above theorem is given in Appendix C.

Algorithm 1 describes the process of user selection, while, for a more visual representation of the flow of our framework, we also give Figure 2.

In the next section, we negotiate the users selected from the auction with an incentive to improve the data quality as a way to further improve the accuracy of the aggregation model.

[IMAGE OMITTED. SEE PDF]

POST-AUCTION NEGOTIATION MODEL

As shown in Table 1, since the auction model does not permit purchasers and suppliers to communicate effectively as in negotiations, and with a predetermined amount of user data, we believe that the quality of the data that can be negotiated is related to the endogenous factor ɛ, which is proportional to the number of bonuses that the purchaser can offer. This paper's post-auction negotiation model focuses on reducing the privacy protection intensity of users through additional incentives, that is, by increasing ɛ _k. Figure 3 represents the general process of Post-Auction Negotiation Model.

[IMAGE OMITTED. SEE PDF]

We believe the amount of change in privacy leakage as a result of bonuses should be as follows: 32 ${\varepsilon }^{\prime }\sim P(\varepsilon \vert bonus).$

In this section, we treat the purchaser as having ${\mathcal{K}}^{\prime }$ agents to negotiate with the purchaser separately. Thus the negotiation can be considered as a two-player game.

First, we express the utility functions of the purchaser and the supplier in the auction phase.

We assume that the utility of the purchaser in reverse auction is 33 ${U}_{p}=\sum\limits _{k=1}^{K}V\left({S}_{k}\right)-\sum\limits _{k=1}^{K}{P}_{k}$ and the utility of the supplier is 34 ${U}_{s}^{k}={P}_{k}-{C}_{k}$ where P _k represents the price paid by the purchaser to the supplier k in the auction, and the function V(.) is V: R → R ₊, representing the revenue obtained by the purchaser when the data is collected with a scoring function of S.

Since the change in the intensity of supplier privacy protection is personal private information, the bonus to be paid to the purchaser during the phase of negotiation should be an expectation bonus. After incentivising users to increase the amount of privacy leakage, the purchaser's utility changes to 35 ${U}_{p}^{k}=V\left({S}_{k}\right)-{P}_{k}+\mathrm{Pr}\left({\varepsilon }^{\prime }\vert bonus\right)\left[{\Delta }V\left({S}_{k}^{\prime }\right)-bonus\right]$ where ${\Delta }V\left({S}_{k}^{\prime }\right)=V\left({S}_{k}^{\prime }\right)-V\left({S}_{k}\right)$ , and $V\left({S}_{k}^{\prime }\right)=V\left({\left({e}^{\varepsilon k+{\varepsilon }_{k}^{\prime }-{\varepsilon }_{\max }}\right)}^{\alpha }{\left(\mathrm{ln}\frac{\vert D{\vert }_{\max }}{\left\vert {D}_{k}\right\vert }\right)}^{\beta }\right)$ .

In addition, the suppliers' compromised privacy to lessen the intensity of privacy protection can be viewed as an additional cost to the suppliers' utility. 36 ${U}_{s}^{k}={P}_{k}-{C}_{k}+\mathrm{Pr}\left({\varepsilon }^{\prime }\vert bonus\right)\left[bonus-{\Delta }{C}_{k}\right]$ where ${\Delta }{C}_{k}=C\left({\varepsilon }_{k}+{\varepsilon }_{k}^{\prime },\vert {D}_{k}\vert \right)-C\left({\varepsilon }_{k},\vert {D}_{k}\vert \right)$ .

Given the utility of purchasers and suppliers, we use social welfare to measure the overall utility 37 $SW=V\left({S}_{k}\right)-{C}_{k}+\mathrm{Pr}\left({\varepsilon }^{\prime }\vert bonus\right)\left[{\Delta }V\left({S}_{k}^{\prime }\right)-{\Delta }{C}_{k}\right]$ where P _k and bonus are cancelled out in the calculation process, known in economics as a transfer of money, which does not play any role in social welfare.

Bargaining phase

After describing some of the basics, the negotiations are described below.

During this phase, the purchaser and the winning supplier k can negotiate the quality requirement, ${\varepsilon }_{k}^{\mathit{Fin}}={\varepsilon }_{k}+{\varepsilon }_{k}^{\prime }$ , and the corresponding payment to the supplier, ${P}_{k}^{\prime }$ . Consider δ ₁ and δ ₂ to be the discount factors for the purchaser and the suppliers respectively. The longer the duration of negotiations, the more likely it is that the discount rate between the parties will diminish the value of the deal. According to ref. [17], the profit following multiple rounds of negotiation can be expressed as 38 $\begin{align*}\hfill & {U}_{p}^{k}={\delta }_{m,1}\left(V\left({S}^{\mathit{Fin}}\right)-{P}_{k}^{\prime }\right)\hfill \\ \hfill & {U}_{s}^{k}={\delta }_{m,2}\left({P}_{k}^{\prime }-C\left({\varepsilon }_{k}^{\mathit{Fin}},\vert {D}_{k}\vert \right)\right)\hfill \end{align*}$ where m represents the number of rounds of negotiation and ${S}^{Fin}$ and ${P}_{k}^{\prime }$ represent the evaluation function and payment involved in supplier-purchaser negotiations.

With the above equation, purchaser-supplier negotiations eventually reach a unique Nash equilibrium [37], and Theorem 6 describes the benefits to both parties as well as the payments after the Nash equilibrium is reached:

$6$

Theorem

(Unique Equilibrium for Price after Negotiation) If the purchaser and suppliers decide to negotiate after the auction, the purchaser will first make a new quality demand, then the parties will reach a unique transaction price P Nash equilibrium in a round of negotiation with the value: 39 ${P}_{k}^{\prime }=V\left({S}_{k}^{\mathit{Fin}}\right)-\left(1-{\delta }_{2}\right)/\left(1-{\delta }_{1}{\delta }_{2}\right)SW$ and the purchasers and suppliers revenue is 40 $\begin{align*}\hfill & {U}_{p}^{k}=\left(1-{\delta }_{2}\right)/\left(1-{\delta }_{1}{\delta }_{2}\right)SW\hfill \\ \hfill & {U}_{s}^{k}=\left({\delta }_{2}-{\delta }_{1}{\delta }_{2}\right)/\left(1-{\delta }_{1}{\delta }_{2}\right)SW.\hfill \end{align*}$

In the next subsection, we will describe the bonuses that the purchaser will pay to the supplier.

Bonus to supplier

The after-negotiation bonus may be viewed as a cost to the purchaser in order to incentivise the supplier to alter quality. In a manner resembling the auction phase, purchasers always seek to minimise expenditures in order to maximise return. In contrast, an expectation-related expression is obtained when ɛ′ ∼ P (ɛ∣bonus) holds true. Still using user k as an example, we can express the bonuses as follows, 41 $\begin{align*}\hfill & min\quad bonus\hfill \\ \hfill & s.t.\hfill \\ \hfill & P\left({\varepsilon }_{k}^{\prime }\vert bonus\right).bonus-{\Delta }{C}_{i}\left({\varepsilon }_{k}^{\prime }\right)-\left(P\left({\varepsilon }_{k}^{\prime \prime }\vert bonus\right)-{\Delta }{C}_{i}\left({\varepsilon }_{k}^{\prime \prime }\right)\right)\ge 0\hfill \\ \hfill & P\left({\varepsilon }_{k}^{\prime }\vert bonus\right).bonus-{\Delta }{C}_{i}\left({\varepsilon }_{k}^{\prime }\right)+{P}_{k}^{\prime }-{C}_{i}\left({\varepsilon }_{k}+{\varepsilon }_{k}^{\prime }\right)\ge 0\hfill \end{align*}$

The first formula in the constraint conditions employs the incentive compatibility theorem to validate the user's relaxed privacy protection intensity value. The second formulation ensures the user's rationality so that their participation in the negotiation is profitable.

As you can see from Equation (38), bonus should be: 42 $bonus=\min \left\{\frac{{\Delta }{C}_{k}\left({\varepsilon }_{k}^{\prime }\right)-{\Delta }{C}_{k}\left({\varepsilon }_{k}^{\prime \prime }\right)}{\mathrm{Pr}\left({\varepsilon }_{k}^{\prime }\vert bonus\right)-p\left({\varepsilon }_{k}^{\prime \prime }\vert bonus\right)},\frac{{C}_{k}\left({\varepsilon }_{k}+{\varepsilon }_{k}^{\prime }\right)+{\Delta }{C}_{k}-{P}_{{k}^{\prime }}}{\mathrm{Pr}\left({\varepsilon }_{k}^{\prime }\vert bonus\right)}\right\}$ since the first item on the right is always less than 0, regardless of whether ${\varepsilon }_{k}^{\prime \prime }$ gets larger or smaller, therefore, 43 $bonus=\frac{{C}_{k}\left({\varepsilon }_{k}+{\varepsilon }_{k}^{\prime }\right)+{\Delta }{C}_{k}-{P}_{k}^{\prime }}{\mathrm{Pr}\left({\varepsilon }_{k}^{\prime }\vert bonus\right)}$

At the end of this section, we propose Theorem 7. And the proof of this theorem is presented in Appendix C.

$7$

Theorem

(Social Welfare Improvement) Negotiations between the two players after an auction improve social welfare relative to auction only.

EXPERIMENT

Our experiments consider an FL system consisting of a central server and K = 100 users. We assume a random selection of privacy leakages for K = 100 users from (3, 10). Before starting FL, we assume that the server has a fixed budget B to motivate workers to perform local model updates. Our budget is B = 100,000.

For the learning configuration, unless otherwise stated, a simultaneous FL scheme is used in this paper. The mini-batch for each user is set to 32 mini-batch data samples. For privacy considerations, the noise variance is determined by the degree of personal privacy disclosure. We use SGD as the optimiser and set the initial learning rate η = 0.1, while setting a simple learning rate decrease (10 rounds of communication). Also, our cost function is set to C(θ; |D|, ɛ) = θ(|D| + ɛ), which θ is users' private information. In the experiment, we randomly draw the values of user θ from a uniform distribution U (0.5, 1). We do not restrict the choice of the cost function, because our algorithm already normalises the cost. Next, we describe the dataset and model used in the experiments, and the baseline for performance comparison.

1.
Dataset: We utilise the MNIST and Fashion-MNIST datasets. In this paper, only the non-IID case is considered. Specifically, for the non-IID scenario, we use Fedlab [39] to perform an unbalanced partitioning of the data, resulting in a different amount of training data for each user, whose classes are not necessarily identical. Figure 4 demonstrates that we used Fedlab to divide the dataset for 100 users and used the data of 20 users to create the image, demonstrating that unbalanced division in Fedlab can ensure that the amount and type of data for each user is distinct.
2.
Model: MLP: We used the most basic MLP model, that is, a three-layer structure with only one hidden layer, where the genus-out function of the hidden layer uses the sigmoid function.

CNN: It consists of two 5 × 5 convolutional layers (the first 16 channels and the second 32 channels, each 2 × 2 max pooling), both Re-Lu activation layers, and a 10-cell fully connected layer.

Comparison algorithm and performance metrics

We compared the algorithm with two 21-year, [34, 35]. Where Sun et al. [34] use contract theory for user selection and Zhang et al. [35] use VCG auctions for user selection while ensuring truthfulness.

In the performance metrics, we give the accuracy comparison, loss comparison, F1-score of our algorithm FLRNDP-A with DP-FFL [35], Pain-FL-C [34]. And additionally, we give the confusion matrix of our algorithm to visualise the accuracy of our FLRNDP-A (Figures 4 and 5).

[IMAGE OMITTED. SEE PDF]

Choice of E

As seen in Section 5.2, the magnitude of the E value affects the FL's precision, and exact values cannot be obtained. In this subsection, we will determine experimentally the optimal value of E. We conducted federation learning with the same configuration at E = 20, 30, 40, and 50, where the privacy leakage of user k was controlled under identical conditions to the initial random selection, and is involved in training at the same time. Figure 5 demonstrates that the model accuracy is greatest when E = 30 for two distinct models and data sets respectively. In subsequent comparison experiments, E is fixed at 30.

Choice of α

As can be seen from Section 6.2, S(.) corresponding to each user is controlled by ɛ and |D|. These two parameters control the amount of privacy leakage and the importance of the amount of data. Since FL with differential privacy requires both higher privacy leakage and higher data volume, the value of α cannot be obtained as an explicit expression, so we use an experimental approach to try to find a parameter value that makes the learning model more accurate, in which we use a budget B = 100,000 while fixing selected users, and the results show that when α = 0.6 and β = 0.4, the highest accuracy is achieved (Figure 6).

Comparison

As can be seen from Figure 7, our algorithm always fluctuates more than FP-FFL and Pain-FL-C. We think this is a normal situation because FP-FFL and Pain-FL-C are mainly selected for ɛ, because the magnitude of C is proportional to the perturbation effect on the model, and our algorithm considers not only ɛ, but also |D|. It can be seen that FLRNDP-A often fluctuates in the early stage, which we consider as a good performance because a slightly higher ɛ will help the model jump out of the local optimum and thus bring better accuracy performance, which, of course, brings fluctuations in the late stage. Although there were fluctuations in the early period, it can be seen that our algorithm finally outperformed the comparison algorithm due to our relatively higher amount of data in the selection of users as a way to reduce the risk of overfitting and to learn more features in the case of the non-IID dataset.

[IMAGE OMITTED. SEE PDF]

It can also be seen from Figure 8 that our algorithm has some jitteriness in the pre-training period, which we attribute to the inclusion of differential privacy that makes the model slightly more volatile in the pre-training period. In the late training period, the loss of our algorithm is generally lower than that of the comparison algorithm.

[IMAGE OMITTED. SEE PDF]

In addition to accuracy and loss, we introduced F1-score to evaluate the model quality. Although accuracy is often used as a measure of the model, it has the limitation that it does not reasonably reflect the predictive power of the model when the sample is unbalanced. Therefore, we introduced the F1-score. The F1-score is a weighted summed average of Precision and Recall, which is written in the binary classification problem as ${F}_{1}=2\frac{P\times R}{P+R}$ , where P is Precision, R is Recall. As can be seen in Figure 9, the F1-score values of our model are slightly higher than those of the two comparison algorithms. And we find that our model has fewer fluctuations in F1-score, which indicates that the quality of our model is gradually improved during the training process.

[IMAGE OMITTED. SEE PDF]

Finally, we give the multiclassification confusion matrix obtained by our algorithm using the MLP model under MNIST and FashionMNIST datasets to visually represent the accuracy of our model, Figure 10, where the main diagonal line represents the correct classification.

[IMAGE OMITTED. SEE PDF]

CONCLUSION

In this paper, we analyse the privacy and convergence of FL with differential privacy using Local SGD, concluding that its convergence is related to aggregation noise. For user selection, we employ a multi-attribute reverse auction method to reduce aggregation noise. After the user is selected, further privacy leakage is negotiated with the user, and it is demonstrated that post-auction negotiation increases social welfare. We conducted experiments on the number of local epochs E and selected the better E, followed by experiments on the selection of α. Experiments reveal that the model is most accurate when E = 30 and α = 0.6. On this basis, we compare it with two state-of-the-art algorithms. Experiments show that our method outperforms the other two algorithms in terms of accuracy and model quality.

Our future work intends to focus on the combination of fairness and incentives, because FL is easily brought to another local optimum by users with high contributions without adding fairness. The research on combining fairness and incentives is currently in its infancy, and there will be a lot of interesting work to investigate.

ACKNOWLEDGEMENTS

National Natural Science Foundation of China, Grant Number: 62062020; National Natural Science Foundation of China, Grant Number: 72161005; Technology Foundation of Guizhou Province, Grant Number: QianKeHeJiChu-ZK [2022]-General184.

CONFLICT OF INTEREST STATEMENT

The author declares that there is no conflict of interest.

DATA AVAILABILITY STATEMENT

The data that support the findings of this study are available from the corresponding author, Hongqin Lyu, upon reasonable request.

Appendix

APPENDIX - A

Assuming that ξ _k and ${\xi }_{k}^{\prime }$ are adjacent mini-batch data that differ only at the j − th sample, the global sensitivity of the gradient is A1 $\begin{align*}\hfill & {\Delta }(\nabla F)=\max \,\left\Vert \nabla F\left({\mathbf{w}}_{\mathbf{k}};{\xi }_{k}\right)-\nabla F\left({\mathbf{w}}_{\mathbf{k}};{\xi }_{k}^{\prime }\right)\right\Vert \hfill \\ \hfill =& \,\max \frac{1}{\left\Vert {\xi }_{k}\right\Vert }\left\Vert \sum\limits _{{x}_{i}\in {\xi }_{k}}l\left({\mathbf{w}}_{k};{x}_{i}\right)-\sum\limits _{{x}_{i}^{\prime }\in {\xi }_{k}^{\prime }}l\left({\mathbf{w}}_{k};{x}_{i}^{\prime }\right)\right\Vert \hfill \\ \hfill & \le \frac{2c}{\left\Vert {\xi }_{k}\right\Vert }\hfill \end{align*}$ also due to Proposition seven in ref. [22], can get A2 ${\sigma }^{2}\ge \alpha \frac{c}{\left\Vert {\xi }_{k}\right\Vert {\varepsilon }_{\mathrm{k}}}$

Appendix

APPENDIX - B

Proof Of Lemma 1

B1 $\begin{align*}\hfill {\left\Vert {\overline{\mathbf{v}}}_{t+1}-{\mathbf{w}}^{\star }\right\Vert }^{2}& ={\left\Vert {\overline{\mathbf{w}}}_{t}-{\eta }_{t}{\mathbf{g}}_{t}-{\mathbf{w}}^{\star }-{\eta }_{t}{\overline{\mathbf{g}}}_{t}+{\eta }_{t}{\overline{\mathbf{g}}}_{t}\right\Vert }^{2}\hfill \\ \hfill & =\underset{A}{\underbrace{{\left\Vert {\overline{\mathbf{w}}}_{t}-{\mathbf{w}}^{\star }-{\eta }_{t}{\overline{\mathbf{g}}}_{t}\right\Vert }^{2}}}+\underset{B}{\underbrace{2{\eta }_{t}\left\langle {\overline{\mathbf{w}}}_{t}-{\mathbf{w}}^{\star }-{\eta }_{t}{\overline{\mathbf{g}}}_{t},{\overline{\mathbf{g}}}_{t}-{\mathbf{g}}_{t}\right\rangle }}+{\eta }_{t}^{2}{\left\Vert {\mathbf{g}}_{t}-{\overline{\mathbf{g}}}_{t}\right\Vert }^{2}\hfill \end{align*}$

Since ${\overline{\mathrm{g}}}_{t}=E{\mathbf{g}}_{t}$ , therefore EB = 0. Then, we now shift our target to the boundary of A, expand A as follows, B2 ${\left\Vert {\overline{\mathbf{w}}}_{t}-{\mathbf{w}}^{\star }-{\eta }_{t}{\overline{\mathbf{g}}}_{t}\right\Vert }^{2}={\left\Vert {\overline{\mathbf{w}}}_{t}-{\mathbf{w}}^{\star }\right\Vert }^{2}\underset{{A}_{1}}{\underbrace{-2{\eta }_{t}\left\langle {\overline{\mathbf{w}}}_{t}-{\mathbf{w}}^{\star },{\overline{\mathbf{g}}}_{t}\right\rangle }}+\underset{{A}_{2}}{\underbrace{{\eta }_{t}^{2}{\left\Vert {\overline{\mathbf{g}}}_{t}\right\Vert }^{2}}}$ where A ₂ is given by Jensen's inequality B3 ${A}_{2}={\eta }_{t}^{2}{\left\Vert {\overline{\mathrm{g}}}_{t}\right\Vert }^{2}\le {\eta }_{t}^{2}\sum\limits _{k=1}^{N}{p}_{k}{\left\Vert \nabla {F}_{k}\left({\mathbf{w}}_{t}^{k}\right)+{\mathbf{n}}_{t}^{k}\right\Vert }^{2}$

The above inequality can be transformed from ‖x + y‖² ≤ ‖x + y‖² + ‖x − y‖² = 2‖x‖² + 2‖y‖² to B4 ${\left\Vert \nabla {F}_{k}\left({\mathbf{w}}_{t}^{k}\right)+{\mathbf{n}}_{t}^{k}\right\Vert }^{2}\le 2{\left\Vert \nabla {F}_{k}\left({\mathbf{w}}_{t}^{k}\right)\right\Vert }^{2}+2{\left\Vert {\mathbf{n}}_{t}^{k}\right\Vert }^{2}$

From the L-smoothness of F _k(.), it follows that B5 ${\left\Vert \nabla {F}_{k}\left({\mathbf{w}}_{t}^{k}\right)\right\Vert }^{2}\le 2L\left({F}_{k}\left({\mathbf{w}}_{t}^{k}\right)-{F}_{k}^{\star }\right)$

Therefore, Equation (B2) is B6 ${A}_{2}\le 4L{\eta }_{t}^{2}\sum\limits _{k=1}^{N}{p}_{k}\left({F}_{k}\left({\mathbf{w}}_{t}^{k}\right)-{F}_{k}^{\ast }\right)+2{\eta }_{t}^{2}\sum\limits _{k=1}^{N}{p}_{k}{\left\Vert {\mathbf{n}}_{t}^{k}\right\Vert }^{2}.$

Now, we bound A ₁: B7 $\begin{align*}\hfill {A}_{1}& =-2{\eta }_{t}\left\langle {\overline{\mathbf{w}}}_{t}-{\mathbf{w}}^{\star },{\overline{\mathbf{g}}}_{t}\right\rangle =-2{\eta }_{t}\sum\limits _{k=1}^{N}{p}_{k}\langle {\overline{\mathbf{w}}}_{t}-{\mathbf{w}}^{\star },\nabla {F}_{k}\left({\mathbf{w}}_{t}^{k}\right)\rangle -2{\eta }_{t}{p}_{k}\left\langle {\mathbf{w}}_{t}-{\mathbf{w}}^{\star },{\mathbf{n}}_{t}^{k}\right\rangle \hfill \\ \hfill & =-2{\eta }_{t}\sum\limits _{k=1}^{N}{p}_{k}\underset{{A}_{11}}{\underbrace{\langle {\overline{\mathbf{w}}}_{t}-{\mathbf{w}}_{t}^{k},\nabla {F}_{k}\left({\mathbf{w}}_{t}^{k}\right)\rangle }}-2{\eta }_{t}\sum\limits _{k=1}^{N}{p}_{k}\underset{{A}_{12}}{\underbrace{\langle {\mathbf{w}}_{t}^{k}-{\mathbf{w}}^{\star },\nabla {F}_{k}\left({\mathbf{w}}_{t}^{k}\right)\langle }}-2{\eta }_{t}{p}_{k}\underset{{A}_{13}}{\underbrace{\langle {\overline{\mathbf{w}}}_{t}-{\mathbf{w}}^{\star },{\mathbf{n}}_{t}^{k}\langle }}\hfill \end{align*}$

By Cauchy-Schwarz inequality and AM-GM inequality to A ₁₁, we have B8 ${-}2\left\langle {\overline{\mathbf{w}}}_{t}-{\mathbf{w}}_{t}^{k},\nabla {F}_{k}\left({\mathbf{w}}_{t}^{k}\right)\right\rangle \le \frac{1}{{\eta }_{t}}{\left\Vert {\overline{\mathbf{w}}}_{t}-{\mathbf{w}}_{t}^{k}\right\Vert }^{2}+{\eta }_{t}{\left\Vert \nabla {F}_{k}\left({\mathbf{w}}_{t}^{k}\right)\right\Vert }^{2}$

Notice the μ-strong convexity of F _k(.), we bound A ₁₂ and A ₁₃ B9 ${-}\left\langle {\mathbf{w}}_{t}^{k}-{\mathbf{w}}^{\star },\nabla {F}_{k}\left({\mathbf{w}}_{t}^{k}\right)\right\rangle \le -\left({F}_{k}\left({\mathbf{w}}_{t}^{k}\right)-{F}_{k}\left({\mathbf{w}}^{\ast }\right)\right)-\frac{\mu }{2}{\left\Vert {\mathbf{w}}_{t}^{k}-{\mathbf{w}}^{\star }\right\Vert }^{2}$ and B10 $\left\langle {\overline{\mathbf{w}}}_{t}-{\mathbf{w}}^{\star },{\mathbf{n}}_{t}^{k}\right\rangle \le \left\Vert {\overline{\mathbf{w}}}_{t}-{\mathbf{w}}^{\star }\right\Vert \left\Vert {\mathbf{n}}_{t}^{k}\right\Vert \le \frac{1}{\mu }\Vert \nabla F\left(\overline{{\mathbf{w}}_{t}}\right)-\nabla F\left({\mathbf{w}}^{\ast }\right)\Vert \left\Vert {\mathbf{n}}_{t}^{k}\right\Vert$ since ∇F (w*) = 0, therefore, B11 $\frac{1}{\mu }\left\Vert \nabla F\left(\overline{{\mathbf{w}}_{t}}\right)-\nabla F\left({\mathbf{w}}^{\ast }\right)\right\Vert =\frac{1}{\mu }\left\Vert \nabla F\left(\overline{{\mathbf{w}}_{t}}\right)\right\Vert \le \frac{G}{\mu }$

Combining Equations (B3)–(B8) gives B12 $\begin{align*}\hfill A& ={\left\Vert {\overline{\mathbf{w}}}_{t}-{\mathbf{w}}^{\star }-{\eta }_{t}{\overline{\mathbf{g}}}_{t}\right\Vert }^{2}\hfill \\ \hfill & \le \left(1-\mu {\eta }_{t}\right){\left\Vert {\overline{\mathbf{w}}}_{t}-{\mathbf{w}}^{\star }\right\Vert }^{2}+\sum\limits _{k=1}^{N}{p}_{k}{\left\Vert {\overline{\mathbf{w}}}_{t}-{\mathbf{w}}_{k}^{t}\right\Vert }^{2}+2{\eta }_{t}\left({\eta }_{t}\sum\limits _{k=1}^{K}{p}_{k}\Vert {\mathbf{n}}_{t}^{k}{\Vert }^{2}-\frac{{p}_{k}G}{\mu }\Vert {\mathbf{n}}_{t}^{k}\Vert \right)\hfill \\ \hfill & +\underset{C}{\underbrace{6L{\eta }_{t}^{2}\sum\limits _{k=1}^{N}{p}_{k}\left({F}_{k}\left({\mathbf{w}}_{t}^{k}\right)-{F}_{k}^{\ast }\right)-2{\eta }_{t}\sum\limits _{k=1}^{N}{p}_{k}\left({F}_{k}\left({\mathbf{w}}_{t}^{k}\right)-{F}_{k}\left({\mathbf{w}}^{\ast }\right)\right)}}\hfill \end{align*}$

We next aim to bound C, We define ${\zeta }_{t}=2{\eta }_{t}\left(1-3L{\eta }_{t}\right)$ . Since ${\eta }_{t}\le \frac{1}{4L},{\eta }_{t}\le {\zeta }_{t}\le 3{\eta }_{t}$ . C can be expressed as B13 $\begin{align*}\hfill C& =-2{\eta }_{t}\left(1-3L{\eta }_{t}\right)\sum\limits _{k=1}^{N}{p}_{k}\left({F}_{k}\left({\mathbf{w}}_{t}^{k}\right)-{F}_{k}^{\ast }\right)+2{\eta }_{t}\sum\limits _{k=1}^{N}{p}_{k}\left({F}_{k}\left({\mathbf{w}}^{\ast }\right)-{F}_{k}^{\ast }\right)\hfill \\ \hfill & =-{\zeta }_{t}\sum\limits _{k=1}^{K}{p}_{k}\left({F}_{k}\left({\mathbf{w}}_{t}^{k}\right)-{F}^{\ast }\right)\hfill \\ \hfill & =-{\zeta }_{t}\sum\limits _{k=1}^{K}{p}_{k}\left({F}_{k}\left({\mathbf{w}}_{t}^{k}\right)-{F}^{\ast }\right)+6L{\eta }_{t}^{2}{\Gamma }\hfill \end{align*}$ where Γ in Equation (B10) is a parameter to measure heterogeneity. Finally, we need that bound is the first term of Equation (B10) right hand B14 $\sum\limits _{k=1}^{N}{p}_{k}\left({F}_{k}\left({\mathbf{w}}_{t}^{k}\right)-{F}^{\ast }\right)=\sum\limits _{k=1}^{N}{p}_{k}\left({F}_{k}\left({\mathbf{w}}_{t}^{k}\right)-{F}_{k}\left({\overline{\mathbf{w}}}_{t}\right)\right)+\sum\limits _{k=1}^{N}{p}_{k}\left({F}_{k}\left({\overline{\mathbf{w}}}_{t}\right)-{F}^{\ast }\right)$

Noting that B15 $\left({F}_{k}\left({\mathbf{w}}_{t}^{k}\right)-{F}_{k}\left({\overline{\mathbf{w}}}_{t}\right)\right)\ge \left\langle \nabla {F}_{k}\left({\overline{\mathbf{w}}}_{t}\right),{\overline{\mathbf{w}}}_{t}^{k}-{\overline{\mathbf{w}}}_{t}\right\rangle$ combining Equations (B6), (B11), (B12) and AM-GM inequality, ${\sum }_{k=1}^{N}{p}_{k}\left({F}_{k}\left({\mathbf{w}}_{t}^{k}\right)-{F}^{\ast }\right)$ follows that B16 $\begin{align*}\hfill & \sum\limits _{k=1}^{N}{p}_{k}\left({F}_{k}\left({\mathbf{w}}_{t}^{k}\right)-{F}^{\ast }\right)\hfill \\ \hfill & \ge -\sum\limits _{k=1}^{N}{p}_{k}\left[{\eta }_{t}L\left({F}_{k}\left({\overline{\mathbf{w}}}_{t}\right)-{F}_{k}^{\ast }\right)+\frac{1}{2{\eta }_{t}}{\left\Vert {\mathbf{w}}_{t}^{k}-{\overline{\mathbf{w}}}_{t}\right\Vert }^{2}\right]+\left(F\left({\overline{\mathbf{w}}}_{t}\right)-{F}^{\ast }\right)\hfill \end{align*}$

Therefore, B17 $\begin{align*}\hfill C& ={\zeta }_{t}\left({\eta }_{t}L-1\right)\sum\limits _{k=1}^{N}{p}_{k}\left({F}_{k}\left({\overline{\mathbf{w}}}_{t}\right)-{F}^{\ast }\right)+\left(6L{\eta }_{t}^{2}+{\zeta }_{t}{\eta }_{t}L\right){\Gamma }+\frac{{\zeta }_{t}}{2{\eta }_{t}}\sum\limits _{k=1}^{N}{p}_{k}{\left\Vert {\mathbf{w}}_{t}^{k}-{\overline{\mathbf{w}}}_{t}\right\Vert }^{2}\hfill \\ \hfill & \le 2{\eta }_{t}^{2}L{\Gamma }+\left(1-\frac{2}{3}{\eta }_{t}\right)\sum\limits _{k=1}^{N}{p}_{k}{\left\Vert {\mathbf{w}}_{t}^{k}-{\overline{\mathbf{w}}}_{t}\right\Vert }^{2}\hfill \end{align*}$ where in the last inequality, these facts are used:

η _t L − 1 ≤ 0
${\sum }_{k=1}^{N}{p}_{k}\left({F}_{k}\left({\overline{\mathbf{w}}}_{t}\right)-{F}^{\ast }\right)=F\left({\overline{\mathbf{w}}}_{t}\right)-{F}^{\ast }\ge 0$
$\left(4L{\eta }_{t}^{2}+{\gamma }_{t}{\eta }_{t}L\right)\le 2{\eta }_{t}^{2}L$

Combining Equations (B9) and (B14), we get that the final bound of A is B18 $\begin{align*}\hfill A& ={\left\Vert {\overline{\mathbf{w}}}_{t}-{\mathbf{w}}^{\star }-{\eta }_{t}{\overline{\mathbf{g}}}_{t}\right\Vert }^{2}\hfill \\ \hfill & \le \left(1-\mu {\eta }_{t}\right){\left\Vert {\overline{\mathbf{w}}}_{t}-{\mathbf{w}}^{\star }\right\Vert }^{2}+\left(2-\frac{2}{3}{\eta }_{t}\right)\sum\limits _{k=1}^{N}{p}_{k}{\left\Vert {\mathbf{w}}_{t}^{k}-{\overline{\mathbf{w}}}_{t}\right\Vert }^{2}+2{\eta }_{t}\left({\eta }_{t}\sum\limits _{k=1}^{K}{p}_{k}{\left\Vert {\mathbf{n}}_{t}^{k}\right\Vert }^{2}-\frac{{p}_{k}G}{\mu }\left\Vert {\mathbf{n}}_{t}^{k}\right\Vert \right)\hfill \\ \hfill & +2{\eta }_{t}^{2}L{\Gamma }\hfill \end{align*}$

Notice that since $E\left\Vert {\mathbf{n}}_{t}^{k}\right\Vert =0$ , $\frac{{p}_{k}G}{\mu }\left\Vert {\mathbf{n}}_{t}^{k}\right\Vert$ in Lemma 1 is erased.

Proof Of Lemma 2

B19 $\mathbb{E}{\left\Vert {\mathrm{g}}_{t}-{\overline{\mathrm{g}}}_{t}\right\Vert }^{2}=\mathbb{E}{\left\Vert \sum\limits _{k=1}^{K}{p}_{k}\left(\nabla {F}_{k}\left({\mathbf{w}}_{t}^{k},{\xi }_{t}^{k}\right)+{\mathbf{n}}_{t}^{k}-\nabla {F}_{k}\left({\mathbf{w}}_{t}^{k}\right)-{\mathbf{n}}_{t}^{k}\right)\right\Vert }^{2}$

The reason that the noise is ${\mathbf{n}}_{t}^{k}$ for both the sample and the full sample in the above expression is that for the same user, the noise values are from the same Gaussian distribution, and the tensor shape composed of the gradient is fixed, so ${\mathbf{n}}_{t}^{k}$ can be used to represent it.

Thus, Equation (B16) reduces to B20 $\mathbb{E}{\left\Vert {\mathrm{g}}_{t}-{\overline{\mathrm{g}}}_{t}\right\Vert }^{2}=\sum\limits _{k=1}^{K}{p}_{k}^{2}\mathbb{E}{\left\Vert \nabla {F}_{k}\left({\mathbf{w}}_{t}^{k},{\xi }_{t}^{k}\right)-\nabla {F}_{k}\left({\mathbf{w}}_{t}^{k}\right)\right\Vert }^{2},$ using Assumption 3, the above equation can be written as B21 $\mathbb{E}{\left\Vert {\mathrm{g}}_{t}-{\overline{\mathrm{g}}}_{t}\right\Vert }^{2}\le \sum\limits _{k=1}^{K}{p}_{k}^{2}{\sigma }_{k}^{2}$

Proof of Lemma 3

Since we use a model that performs FedAVG aggregation every E steps. Therefore, for any t ≥ 0 there exists a t ₀ ≤ t, such that t − t ₀ ≤ E − 1 and ${\mathbf{w}}_{{t}_{0}}^{k}={\overline{\mathbf{w}}}_{{t}_{0}}$ . Also, we use that η _t is non-increasing and ${\eta }_{{t}_{0}}\le 2{\eta }_{t}$ . Thus, B22 $\begin{align*}\hfill \mathbb{E}\sum\limits _{k=1}^{N}{p}_{k}{\left\Vert {\overline{\mathbf{w}}}_{t}-{\mathbf{w}}_{t}^{k}\right\Vert }^{2}& =\mathbb{E}\sum\limits _{k=1}^{N}{p}_{k}{\left\Vert \left({\mathbf{w}}_{t}^{k}-{\overline{\mathbf{w}}}_{{t}_{0}}\right)-\left({\overline{\mathbf{w}}}_{t}-{\overline{\mathbf{w}}}_{{t}_{0}}\right)\right\Vert }^{2}\hfill \\ \hfill & \le \mathbb{E}\sum\limits _{k=1}^{N}{p}_{k}{\left\Vert {\mathbf{w}}_{t}^{k}-{\overline{\mathbf{w}}}_{{t}_{0}}\right\Vert }^{2}\hfill \\ \hfill & \le \sum\limits _{k=1}^{N}{p}_{k}\mathbb{E}\sum\limits _{t={t}_{0}}^{t-1}(E-1){\eta }_{t}^{2}{\left\Vert \nabla {F}_{k}\left({\mathbf{w}}_{t}^{k},{\xi }_{t}^{k}\right)+{\mathbf{n}}_{t}^{k}\right\Vert }^{2}\hfill \\ \hfill & \le \sum\limits _{k=1}^{N}{p}_{k}\mathbb{E}\sum\limits _{t={t}_{0}}^{t-1}(E-1){\eta }_{t}^{2}\left(2{\left\Vert \nabla {F}_{k}\left({\mathbf{w}}_{t}^{k},{\xi }_{t}^{k}\right)\right\Vert }^{2}+2{\left\Vert {\mathbf{n}}_{t}^{k}\right\Vert }^{2}\right)\hfill \\ \hfill & \le 8{\eta }_{t}^{2}{(E-1)}^{2}\left({G}^{2}+{\left\Vert {\mathbf{n}}_{t}^{k}\right\Vert }^{2}\right)\hfill \end{align*}$

Proof Of Theorem 2

Since the noise addition process is performed during training, either $t+1\in {\mathcal{I}}_{E}$ or $t+1\notin {\mathcal{I}}_{E}$ , there is always ${\overline{\mathbf{w}}}_{t+1}={\overline{\mathbf{v}}}_{t+1}$ . We let ${{\Delta }}_{t}=\mathbb{E}{\left\Vert {\overline{\mathbf{w}}}_{t+1}-{\mathbf{w}}^{\star }\right\Vert }^{2}$ and, combining Lemma 1, 2 and 3, get B23 ${{\Delta }}_{t+1}\le \left(1-{\eta }_{t}\mu \right){{\Delta }}_{t}+{\eta }_{t}^{2}B$ where B24 $B=8{(E-1)}^{2}\left({G}^{2}+{\left\Vert {\mathbf{n}}_{t}^{k}\right\Vert }^{2}\right)+\sum\limits _{k=1}^{K}{p}_{k}^{2}{\sigma }_{k}^{2}+2L{\Gamma }+2\left(\sum\limits _{k=1}^{K}{p}_{k}{\left\Vert {\mathbf{n}}_{t}^{k}\right\Vert }^{2}\right)$

For a diminishing stepsize, we let ${\eta }_{t}=\frac{\beta }{t+\gamma }$ . Then, we will prove ${{\Delta }}_{t}\le \frac{v}{\gamma +t}$ .

Using mathematical induction, assume the conclusion holds for some t, it follows that B25 $\begin{align*}\hfill {{\Delta }}_{t+1}& \le \left(1-{\eta }_{t}\mu \right){{\Delta }}_{t}+{\eta }_{t}^{2}B\hfill \\ \hfill & =\left(1-\frac{\beta \mu }{t+\gamma }\right)\frac{v}{t+\gamma }+\frac{{\beta }^{2}B}{{(t+\gamma )}^{2}}\hfill \\ \hfill & \le \frac{v}{t+\gamma +1}\hfill \end{align*}$

Due to the strong convexity of F (.), B26 $\mathbb{E}\left[F\left({\overline{\mathbf{w}}}_{t}\right)\right]-{F}^{\ast }\le \frac{L}{2}{{\Delta }}_{t}\le \frac{L}{2}\frac{v}{\gamma +t}$

Specifically, if we choose $\beta =\frac{2}{\mu },\gamma =\max \left\{8\frac{L}{\mu }-1,E\right\}$ and denote $\kappa =\frac{L}{\mu }$ , then B27 $\mathbb{E}\left[F\left({\overline{\mathbf{w}}}_{t}\right)\right]-{F}^{\ast }\le \frac{2\kappa }{\gamma +t}\left(\frac{B}{\mu }+2L{{\Delta }}_{1}\right).$

End of proof.

Appendix

APPENDIX - C

Proof Of Theorem 3

If user k of the candidate set is selected, then its utility is ${U}_{s}^{k}={P}_{k}-{C}_{k}$ . If user k is unchecked, its utility is 0. Now we focus only on the selected user, assumed to be the k − th. From Equation (28), the payment received by user k is ${P}_{k}={\rho }_{{K}^{\prime }+1}^{\prime }{S}_{k}$ .

According to C1 $\begin{align*}\hfill & \frac{{C}_{k}}{{S}_{k}}\le \frac{{C}_{k+1}}{{S}_{k+1}}\hfill \\ \hfill \Rightarrow & {C}_{k}\le {\mathrm{S}}_{k}\frac{{C}_{k+1}}{{S}_{k+1}}={P}_{k}\hfill \end{align*}$ therefore, ${U}_{s}^{k}\ge 0$ . Proof of completion.

Proof Of Theorem 4

From Equation (27), ${\rho }_{\left\vert {\mathcal{K}}^{\prime }\right\vert +1}^{\prime }{\sum }_{k=1}^{{\mathcal{K}}^{\prime }}{S}_{k}\le B$ is computed for each user selected by our auction model. The selection stops when the total payment exceeds B, so our auction payment is C2 $\sum\limits _{k=1}^{\vert \mathcal{K}\vert }{P}_{k}=\sum\limits _{k=1}^{\vert {\mathcal{K}}^{\prime }\vert }{P}_{k}=\sum\limits _{k=1}^{\vert {\mathcal{K}}^{\prime }\vert }{S}_{k}{\rho }_{\vert {\mathcal{K}}^{\prime }\vert +1}\le B,$ it has budget feasibility, proof of completion.

Proof Of Theorem 5

According lemma 1, we know what conditions need to be met if the auction is to be truthful. We know what conditions need to be met if the auction is to be truthful.

First, we prove that the selection rule is monotonic. Suppose user k is selected, then his ${\rho }_{k}=\frac{{C}_{k}}{{S}_{k}}$ . If user k decreases his bid, he will likely be ahead of the previous one, at least unchanged, if the other users' bids remain the same.

Second, we come to prove that participants receive a critical value of the bonus. Suppose that the first k users are selected to participate in FL through an auction, which contains a user j whose received payment is P _j. In the following, we will describe two possible changes:

If user j's bid ${C}_{j}^{\prime }< {P}_{j}$ , then its ranking position will remain at least the same, while the revenue will not change.
If user j's bid ${C}_{j}^{\prime } > {P}_{j}$ , then its position will move backwards, resulting in not being selected as a user to participate in the training, resulting in 0 gain for itself.

In summary, the payment P is the critical value. By the lemma 1, it is clear that FLRNDP-A has truthfulness.

Appendix

APPENDIX - D

Proof Of Theorem 6

Based on Definition 4, the purchaser's profit will be $\left(1-{\delta }_{2}\right)/\left(1-{\delta }_{1}{\delta }_{2}\right)$ of the social welfare and the supplier's profit will be $\left({\delta }_{2}-{\delta }_{1}{\delta }_{2}\right)/\left(1-{\delta }_{1}{\delta }_{2}\right)$ of the social welfare.

From Equation (34), the gain of the purchaser is D1 $\left(1-{\delta }_{2}\right)/\left(1-{\delta }_{1}{\delta }_{2}\right)SW=V\left(S\left({\varepsilon }_{k}+{\varepsilon }_{k}^{\prime },\left\vert {D}_{k}\right\vert \right)\right)-{P}_{k}^{\prime }.$

For the sake of notational unity, we will write $S\left({\varepsilon }_{k}+{\varepsilon }_{k}^{\prime },\left\vert {D}_{k}\right\vert \right)$ as ${S}_{k}^{\mathit{Fin}}$ , therefore, the payment ${P}_{k}^{\prime }$ is D2 ${P}_{k}^{\prime }=V\left({S}_{k}^{\mathit{Fin}}\right)-\left(1-{\delta }_{2}\right)/\left(1-{\delta }_{1}{\delta }_{2}\right)SW$ and the purchasers receive the following revenue D3 ${U}_{p}^{k}=\left(1-{\delta }_{2}\right)/\left(1-{\delta }_{1}{\delta }_{2}\right)SW$ the suppliers receive the following revenue D4 ${U}_{s}^{k}=\left({\delta }_{2}-{\delta }_{1}{\delta }_{2}\right)/\left(1-{\delta }_{1}{\delta }_{2}\right)SW$

End of proof.

Proof Of Theorem 7

Here, we only analyse one purchaser and one supplier. During the auction phase, the social welfare guided by both parties is: D5 $S{W}_{A}=V\left({S}_{k}\right)-{C}_{k}$

After negotiation, the social welfare are: D6 $S{W}_{P}=V\left({S}_{k}^{\mathit{Fin}}\right)-{C}_{k}^{\prime }+\mathrm{Pr}\left({\varepsilon }^{\prime }\vert bonus\right)\left[{\Delta }V\left({S}_{k}^{\mathit{Fin}}\right)-{\Delta }{C}_{k}\right]$

According to S ≥ P ≥ C, D7 $S{W}_{P}-S{W}_{A}\ge 0.$

End of proof.

References

Kairouz, P. , et al.: Advances and open problems in federated learning. Found. Trends® Mach. Learn. 14(1–2), 1–210 (2021). [DOI: https://dx.doi.org/10.1561/2200000083]

Zeng, D. , et al.: FedLab: A Flexible Federated Learning Framework (2021). arXiv preprint arXiv:210711621

Word count: 15015

Show less

© 2023. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

The incentive mechanism of federated learning has been a hot topic, but little research has been done on the compensation of privacy loss. To this end, this study uses the Local SGD federal learning framework and gives a theoretical analysis under the use of differential privacy protection. Based on the analysis, a multi‐attribute reverse auction model is proposed to be used for user selection as well as payment calculation for participation in federal learning. The model uses a mixture of economic and non‐economic attributes in making choices for users and is transformed into an optimisation equation to solve the user choice problem. In addition, a post‐auction negotiation model that uses the Rubinstein bargaining model as well as optimisation equations to describe the negotiation process and theoretically demonstrate the improvement of social welfare is proposed. In the experimental part, the authors find that their algorithm improves both the model accuracy and the F1‐score values relative to the comparison algorithms to varying degrees.

Details

Title

Federated learning privacy incentives: Reverse auctions and negotiations

Author

Lyu, Hongqin¹

; Zhang, Yongxiong¹; Wang, Chao¹; Long, Shigong¹; Guo, Shengnan¹

¹ College of Computer Science and Technology, Guizhou University, Guiyang, China

Pages

1538-1557

Section

REGULAR ARTICLES

Publication year

2023

Publication date

Dec 1, 2023

Publisher

John Wiley & Sons, Inc.

e-ISSN

24682322

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1049/cit2.12190

ProQuest document ID

3091950199

Federated learning privacy incentives: Reverse auctions and negotiations

Jump to:

Full text

Abstract

Details

Suggested sources