This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1. Introduction
Federated learning (FL) [1, 2] has risen as a groundbreaking subdomain of machine learning (ML) that enables Internet of Things (IoT) devices to contribute their real-time data and processing to train ML models. FL represents a distributed architecture of a central server and heterogeneous clients, aiming to reduce the empirical loss of model prediction over nonindependent and identically distributed (non-IID) data. In contrast to traditional ML algorithms that require large amounts of homogeneous data in a central location, FL utilizes on-device intelligence over distributed data [3, 4]. The limited feasibility of ML in industrial and IoT applications is overturned by the introduction of FL. Some potential applications of FL include Google keyboard [5], image-based geolocation [6], healthcare informatics [7], and wireless communications [8].
Each round of a FL paradigm constitutes of client-server communication, local training, and model aggregation [9]. The communication overhead is usually due to model broadcast from the server to all clients and vice versa. In every communication round, there is a feasibility risk in terms of limited network bandwidth, packet transmission loss, and privacy breach. In the growing applications of Industrial Internet of Things (IIoT), where the communication is Machine to Machine (M2M) these parameters may be static, making efficiency in data transfer important. Modified communication algorithms [10] use compression and encryption to reduce the model size and protect privacy. Communication load is also determined by the number of edge devices. Sparsification of communication [11] implemented over clients is modeled to increase convergence rate and reduce network traffic on the server. Many models also utilize hierarchical clustering [12] to generalize similar client models and reduce the aggregation complexity.
Apart from communication, training ML models in a heterogeneous setup presents a huge challenge [13]. Once the server model is broadcast, the clients train on it considering some hyperparameters such as client ratio (i.e., from a strength of 100, number of clients chosen), learning rate, batch size, and epochs per round. With each edge device, computational power and properties of data (ambiguity, size, and complexity) vary drastically, and diversely trained client models are hard to aggregate. In a realistic scenario of thousands of edge devices, the updated global model may not converge at all. Existing aggregating algorithms such as FedAvg and FedMA [14] focus more on integration of weights of the local models. Convergence rate and learning saturation are common concerns when it comes to training and aggregation. Several novel approaches work around model aggregation either by using feature fusion of global and local models [15] or by a grouping of similar client models [16] to increase generalization. Some literatures also utilize multiple global models to better converge data [17].
Research on making FL models adaptive to non-IID data has focused primarily on model aggregation. Local training of the model itself is an undermined step, given its role in the final accuracy. In this paper, we propose three novel contributions to lessen the empirical risk in FL, as shown in Figure 1:
(i) Clustering of clients solely based on model hyperparameters to increase the learning efficiency per unit training of the model
(ii) Implementation of density-based clustering, i.e., DBSCAN, on the hyperparameters for proper analysis of devices properties
(iii) Introduction of genetic evolution of hyperparameters per cluster for finer tuning of individual device models and better aggregation
[figure omitted; refer to PDF]
In particular, we introduce a new algorithm, namely, Genetic CFL, that clusters hyperparameters of a model to drastically increase the adaptability of FL in realistic environments. Hyperparameters such as batch size and learning rate are core features of any MFL model. In truth, every model is tuned manually depending on its behavior to the data. Therefore, in a realistic heterogeneous setup, the proper selection of these parameters could result in significantly better results. DBSCAN algorithm is used since it is not deterministic, static in terms of cluster size and uses neighbourhood of model hyperparameters for clustering. We also introduce genetic optimization of those parameters for each cluster. Genetic algorithm is algorithm since it is highly application flexible and scalable to higher dimensions. As defined, each cluster of clients has its own unique set of properties (i.e., hyperparameters) that are suitable for the training of the respective models. In each round, we determine the best parameters for each cluster and evolve them to better suit the cluster.
The rest of this paper is organized as follows. Section 2 discusses the recent work done in the fields of FL, clustering, and evolutionary optimization algorithms. The proposed algorithm is defined in Section 3, followed by the results in Section 4. Finally, the paper is concluded in Section 5.
2. Related Work
In this section, we survey the current literature on the topics of FL, density-based clustering, and evolutionary algorithms, respectively, and try to understand their limitations.
2.1. Federated Learning
Recently, FL as a distributed and edge ML architecture is being studied extensively [1, 18]. This decentralized nature of FL directly contradicts traditional ML algorithms which are genuinely difficult to train in a heterogeneous environment consisting of non-IID data. Novel approaches have tried to overcome this difficulty through various model aggregation algorithms, namely, FedMA [14], feature fusion of global and local models [15], and agnostic FL and grouping of similar client models [16] for better personalization and accuracy. Clustering takes advantage of data similarity in various clients and models [19] and efficient communication, and lastly improves global generalization [20]. In general, much work is yet to be done in terms of efficient model training on non-IID data.
2.2. Density-Based Clustering
Clustering in FL is primarily used for efficient communication and better generalization. In a realistic scenario with thousands of nodes, aggregating everything into a single model may damp the convergence greatly. Several partitioning, hierarchical, and density-based clustering algorithms have been applied to work on some of the problems existing in FL. Partitioning clustering algorithms such as k-means clustering [21] demand a predetermined number of clusters, but in actuality that is not feasible. Some examples of nondefinitive clusters include agglomorative hierarchical clustering [22] and generative adversarial network-based clustering. In this paper, we propose to use DBSCAN (density-based spatial clustering of applications with noise) [23], a density-based clustering algorithm that only groups points if they satisfy a density condition.
2.3. Evolutionary Algorithms
Hyperparameters of a model determine their ability to learn from a certain set of data. Optimization of ML models and their hyperparameters using evolutionary algorithms [24] such as whale optimization [25] and genetic algorithms [26] are explored by many researchers. In addition, these algorithms have been extensively used over DL frameworks that have become a trend for optimization tasks [27]. The same has yet not been adopted for FL extensively. Also, algorithms such as reinforcement learning (RL) with focus on Q-Learning are not suitable for highly complex scenarios [28]. The need for hyperparameter tuning increases even more in FL due to the ambiguity in data, and the abovementioned optimization algorithms assist in tuning those parameters beyond manual capacity. Since optimization of each client model parameters is not feasible, we propose to do so for each cluster.
Through the survey, we observe that FL is greatly limited by efficiency of individual client training that includes apt choice of hyperparameters, increasing adaptive nature of the models and optimization of such process.
3. Genetic CFL Architecture
In this section, we give a detailed mathematical model of our algorithm, genetic CFL. The complete pipeline is divided into two parts, the initial broadcast round represented by Algorithm 1 to determine the clusters and the federated training using genetic optimization represented by Algorithm 2. The variational behavior of the algorithm with different hyperparameters, including client ratio (
Table 1
Symbol representations.
Symbol | Meaning |
n | Number of clients |
Learning rate | |
Batch size | |
Model weights | |
Model weight of |
Algorithm 1: Initial broadcast round and clustering.
n = Number of Clients
(1) procedure Server
(2)
(3) Initialize
(4) Broadcast (
(5) procedure Client
(6)
(7) while
(8)
(9) while
(10) Train
(11)
(12) j = j + 1
(13) i = i + 1
(14) return
(15) procedure Server
(16)
(17) Initialize DBSCAN Clustering Algorithm
(18)
(19)
(20)
(21) clusters = model.fit_predict (
Algorithm 2: Genetic optimization based FL on clustered data.
rounds: Number of loops for training the federated model
(1) function Mutate (
(2) factor
(3)
(4) return
(5) function Crossover (
(6) Initialize temporary array
(7)
(8) for
(9)
(10)
(11) return
(12) function Evolve (
(13)
(14)
(15) return Crossover (
(16) procedure Train
(17)
(18)
(19) Initialize
(20)
(21) for
(22) for
(23) ind = [clusters.index (cluster[i])]
(24)
(25) Empty arrays losses,
(26) for
(27)
(28)
The purpose of Algorithm 1 is to discreetly determine the data characteristics of an edge device without intruding on their privacy. A server model (
At server, the models
After summation, the output of the equation is divided by the number of clients to obtain model aggregation as
After server model aggregation, the DBSCAN clustering algorithm is applied. In a realistic scenario, the number of edge devices and their variance cannot always be determined. In deterministic partitioning clustering methods such as K-means clustering, the number of clusters has to be predetermined and is not dynamic. DBSCAN, on the contrary, uses density-based reasoning for the grouping of similar objects. It takes two mandatory inputs,
Here,
In the object space of only learning rate,
After each edge device is allotted a cluster-ID, we implement phase-2, shown by Algorithm 2. This section of the algorithm works under the main control loop which runs for
(1) Hyperparameters are optimized per cluster using genetic algorithm involving evolution followed by crossover and finally mutation
(2) The server model with optimized hyperparameters is broadcast to each client clusterwise
(3) Each client is trained based on said parameters
(4) Client models are aggregated to form the latest server model
Every cluster has a different set of characteristic hyperparameters suitable to the edge devices belonging to them. These clustered parameters are evolved genetically followed by training for every
Once sorted, we obtain new individuals through crossover and mutation, respectively. The best individuals (hyperparameters in a cluster) retain their genes and are promoted to the next generation (round), while the others are formed by mating of individuals from the last generation as
The new learning rates
After genetic evolution, the server model is again broadcast to all devices with their respective cluster hyperparameters. Each edge device trains for 1 epoch, and the complete process of genetic optimization, training, and model aggregation is repeated for
4. Experiments and Results
This section deals with the experiments that have been conducted to validate and test the proposed genetic CFL architecture. Section 4.1 deals with the clustering of the client edge devices and the clustering behavior under various parameters. Sections 4.2 and 4.3 after DBSCAN are concerned with the performance of the genetic CFL architecture on MNIST and CIFAR-10 datasets, respectively, and their comparison with the generic FL architecture. The overall performance analysis for the genetic CFL architecture is discussed in Section 4.4.
4.1. DBSCAN Clustering of the Client Models
The DBSCAN algorithm, as discussed in the previous section, focuses on the Euclidean distance between the observations to calculate the density and cluster the observations based on this density. The models in each edge device are assigned a particular learning rate and batch size for training. These two hyperparameters serve as the primary two dimensions for each observation for the process of clustering. The DBSCAN algorithm takes two main parameters for clustering a set of observations:
Table 2 summarizes the conditions tested for the quality and effectiveness of clustering with the said parameters. For each value of
Table 2
DBSCAN clustering parameters and outcomes.
Min samples | Number of clusters | |
0.200 | 1 | 7 |
2 | 7 | |
0.175 | 1 | 7 |
2 | 7 | |
0.150 | 1 | 8 |
2 | 7 | |
0.100 | 1 | 15 |
2 | 18 |
4.2. Performance of the Genetic CFL Architecture on MNIST Dataset
In this section, we discuss the performance and analyse the training curves of the models. The server model is initially trained on a subset of the MNIST handwritten digits’ dataset [29]. This model is then distributed among the clients based on the client ratio. The total number of clients chosen for this experiment is 100 and the client ratios tested for are 0.1, 0.15, and 0.3. In essence, we evaluate the performance of the models on 10, 15, and 30 clients, respectively. Each client device is provided with a random subset of the dataset with a random number of observations. This is to make sure that the data is non-IID, and the characteristics of the real-time scenario is emulated. For the initial round, the hyperparameters (learning rate and batch size) of the client devices are randomized within intervals (4) and (5), respectively. The client devices are trained for two epochs and the hyperparameters are subjected to genetic evolution as discussed in Section 3. These rounds are tabulated in Table 3, and the best performance is plotted against each round in Figure 4.
Table 3
Performance of FL against genetic CFL on MNIST dataset for various hyperparameters.
Client ratio | Rounds | FL | Genetic CFL | ||
Accuracy | Loss | Accuracy | Loss | ||
0.1 | 3 | 0.9133 | 0.3136 | 0.9679 | 0.1203 |
6 | 0.9265 | 0.2493 | 0.9730 | 0.1343 | |
10 | 0.9367 | 0.2115 | 0.9777 | 0.1923 | |
0.15 | 3 | 0.9176 | 0.2878 | 0.9665 | 0.1049 |
6 | 0.9740 | 0.0876 | 0.9740 | 0.0876 | |
10 | 0.9443 | 0.1828 | 0.9763 | 0.0910 | |
0.3 | 3 | 0.9178 | 0.2989 | 0.9698 | 0.0964 |
6 | 0.9326 | 0.2359 | 0.9780 | 0.0804 | |
10 | 0.9450 | 0.1946 | 0.9799 | 0.0849 |
Since the model training hyperparameters are no longer predetermined, the performance of the models and their respective training are optimized locally in the cluster, thus providing a more personalized training for each cluster. The performance of the server model obtains a smooth learning curve and converges faster than the normal training of the model using FL. Table 3 represents this performance of the models for both the architectures. The superiority of performance of genetic CFL over generic FL is evident for each round. The accuracy of the genetic CFL architecture is consistently higher and the loss is consistently lower as compared to the generic FL architecture. The increase in accuracy and the decrease in loss signify that the models are indeed training and useful information is aggregated at the server.
4.3. Performance of the Genetic CFL Architecture on CIFAR-10 Dataset
This section deals with the performance and the training of the models on CIFAR-10 dataset [30] using genetic CFL architecture and its comparison with the performance of the generic FL architecture. The training process of this dataset is similar to the training of the MNIST handwritten digits’ dataset. The server initializes the model and distributes the weights of the server model to every client device; the models are trained on the random subset of the dataset assigned for two epochs; the current hyperparameters are subjected to genetic evolution; the trained weights are sent back to the server to get aggregated. This process is repeated for several rounds. The performance of the server model after each round, at the end of the aggregation phase, is plotted in Figure 5 and tabulated in Table 4.
[figure omitted; refer to PDF]Table 4
Performance of FL against genetic CFL on CIFAR-10 dataset for various hyperparameters.
Client ratio | Rounds | FL | Genetic CFL | ||
Accuracy | Loss | Accuracy | Loss | ||
0.10 | 3 | 0.6818 | 0.9540 | 0.6514 | 1.0097 |
6 | 0.6891 | 1.3746 | 0.6639 | 1.3447 | |
10 | 0.6862 | 1.6617 | 0.6599 | 1.6449 | |
0.15 | 3 | 0.6973 | 0.9612 | 0.7098 | 0.8814 |
6 | 0.6988 | 1.3675 | 0.7199 | 1.693 | |
10 | 0.6952 | 1.6225 | 0.7129 | 1.3806 | |
0.30 | 3 | 0.7578 | 0.8708 | 0.7688 | 0.7818 |
6 | 0.7634 | 1.0891 | 0.7646 | 0.981 | |
10 | 0.7613 | 1.2964 | 0.7623 | 1.2961 |
The performance of the models trained on the hyperparameters that are optimized using genetic algorithm for the respective clusters is higher than those that are not. This performance is consistent with any number of client devices. The performance also improves as the client ratio increases. The lowest loss is encountered at the second round for client ratio 0.3. The accuracy however peaks at the fourth epoch with a decent amount of loss for prediction. Any further training of the models does not provide better performance causing overfitting. The training of the models is stopped at round two. The aggregated model therefore provides a significant performance boost for very few rounds. This provides speed and high throughput while deployment in a real-time system.
4.4. Performance Analysis of Genetic CFL
The genetic CFL algorithm performs better with a higher sample size. Higher number of observations per cluster should therefore improve the optimization of the hyperparameters. However, taking into consideration the diversity of datasets both in the data characteristics and the number of data points, proper clustering of similar scenarios should provide higher throughput for the models individually. This calls for a balance between the number of clusters and the size of the cluster. A proper balance can ensure that the performance of the models in the federated architecture provides the best output in the given scenario. In a real-time application, the amount of edge devices expected is higher as compared to a synthetic environment. Following the progression of the performance, the higher number of total clients increases the performance significantly. The optimization of the hyperparameters using genetic CFL provides higher throughput for comparatively less number of rounds.
Our architecture, genetic CFL, outperforms both algorithms [31, 32] in accuracy and rounds. This holds up the fact that genetic CFL architecture performs better while taking less number of rounds. In case of iterative clustering [16], our architecture outperforms in the case of MNIST dataset but does not in the case of the CIFAR-10 data. This behavior is attributed to the rotation and augmentation of data. This gives an upper hand in better feature extraction and representation. Genetic optimization provides an elastic and adaptive framework for optimization of the hyperparameters. This flexibility gives the architecture an edge over other methods by adapting to the dataset and the required environment. Most of the other types of architectures need to perform hyperparameter tuning beforehand and thus requiring manual intervention. This causes the system to be reset and a different set of parameters for a different type of data and application. This rigidity can cost both time and resources. Moreover, importance to every single client is given, thus affecting not only the server model performance but also the performance of every single client device. A better delivery of service for each and every client device is ensured while increasing the performance of the server model as a whole. Table 5 shows the comparison between the performance of our architecture, genetic CFL, with other architectures that incorporate clustering in federated learning. The table consists of the best accuracy of the models on the MNIST handwritten digits’ dataset and the CIFAR-10 dataset for a given number of rounds. It is evident that the number of rounds taken is significantly less keeping the accuracy higher.
Table 5
Performance comparison.
Algorithm | Rounds | MNIST | CIFAR-10 |
Genetic CFL | 10 | 97.99 | 76.88 |
Byzantine robustness of CFL [31] | 200 | 97.4 | 75.3 |
FedZip [32] | 20 | 98.03 | — |
Iterative federated clustering [16] | — | 95.25 | 81.51 |
5. Conclusion
In this work, we have applied the genetic evolutionary algorithm to optimize the hyperparameters—learning rate and batch size—during the training of the individual end device models in a cluster for the FL architecture. We have identified and filled the gaps in the existing techniques and contributed algorithm of the genetic CFL architecture. This architecture has been tested using MNIST handwritten digits’ dataset and CIFAR-10 dataset. An accuracy of 97.99% and 76.88% has been, respectively, achieved on the datasets. We discussed and analysed the observations and the performance of the genetic CFL architecture. We have also covered the favourable conditions and the limitations for the algorithm to provide the best performance in deployment. The overall performance of the models display significant rise in efficiency while reducing communication and computation cost.
As part of the future work, the amount of clients and the client ratio can be scaled into larger samples closely mimicking the real-time situation due to the high scalability of the model. As the population sample increases, the optimization of the hyperparameters gets more efficient thus delivering higher throughput in the real-time scenario. The type of data processed is not limited, and this architecture can be used for various scenarios such as natural language processing tasks, image classification tasks, and recommendation systems. Genetic CFL can also be integrated with time sensitive systems to deliver better performance in very less number of rounds.
Disclosure
This manuscript is available as a preprint in Arxiv at “https://arxiv.org/abs/2107.07233.” The code for this work is available in the repository at https://github.com/sagnik106/Clustered-FL-GA.
[1] C. Zhang, Y. Xie, H. Bai, B. Yu, W. Li, Y. Gao, "A survey on federated learning," Knowledge-Based Systems, vol. 216,DOI: 10.1016/j.knosys.2021.106775, 2021.
[2] M. Parimala, R. M. Swarna Priya, Q.-V. Pham, K. Dev, P. Kumar Reddy Maddikunta, T. R. Gadekallu, T. Huynh-The, "Fusion of federated learning and industrial internet of things: a survey," 2021. https://arxiv.org/abs/2101.00798
[3] M. Alazab, R. M. Swarna Priya, M. Parimala, T. R. Gadekallu, Q.-V. Pham, P. Reddy, "Federated learning for cybersecurity: concepts, challenges and future directions," IEEE Transactions on Industrial Informatics,DOI: 10.1109/tii.2021.3119038, 2021.
[4] W. Wang, M. H. Fida, Z. Lian, Z. Yin, Q.-V. Pham, T. R. Gadekallu, K. Dev, C. Su, "Secure-enhanced federated learning for ai-empowered electric vehicle energy prediction," IEEE Consumer Electronics Magazine,DOI: 10.1109/mce.2021.3116917, 2021.
[5] T. Yang, G. Andrew, H. Eichner, H. Sun, W. Li, N. Kong, D. Ramage, F. Beaufays, "Applied federated learning: Improving google keyboard query suggestions," 2018. https://arxiv.org/abs/1812.02903
[6] M. R. Sprague, A. Jalalirad, M. Scavuzzo, C. Capota, M. Neun, L. Do, M. Kopp, "Asynchronous federated learning for geospatial applications," Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 21-28, .
[7] J. Xu, B. S. Glicksberg, S. Chang, P. Walker, J. Bian, F. Wang, "Federated learning for healthcare informatics," Journal of Healthcare Informatics Research, vol. 5 no. 1,DOI: 10.1007/s41666-020-00082-4, 2020.
[8] Q.-V. Pham, M. Zeng, R. Ruby, T. Huynh-The, W.-J. Hwang, "UAV communications for sustainable federated learning," IEEE Transactions on Vehicular Technology, vol. 70 no. 4, pp. 3944-3948, DOI: 10.1109/tvt.2021.3065084, 2021.
[9] A. Nilsson, S. Smith, G. Ulm, E. Gustavsson, M. Jirstrand, "A performance evaluation of federated learning algorithms," Proceedings of the Second Workshop on Distributed Infrastructures for Deep Learning,DOI: 10.1145/3286490.3286559, .
[10] C. Fang, Y. Guo, Y. Hu, B. Ma, L. Feng, A. Yin, "Privacy-preserving and communication-efficient federated learning in internet of things," Computers & Security, vol. 103,DOI: 10.1016/j.cose.2021.102199, 2021.
[11] E. Ozfatura, K. Ozfatura, D. Gunduz, "Time-correlated sparsification for communication-efficient federated learning," 2021. https://arxiv.org/abs/2101.08837
[12] C. Briggs, "Federated learning with hierarchical clustering of local updates to improve training on non-iid data," 2020. https://arxiv.org/abs/2004.11791
[13] Y. Zhao, L. Meng, L. Lai, N. Suda, D. Civin, V. Chandra, "Federated learning with non-iid data," 2018. https://arxiv.org/abs/1806.00582
[14] H. Wang, M. Yurochkin, Y. Sun, D. Papailiopoulos, Y. Khazaeni, "Federated learning with matched averaging," 2020. https://arxiv.org/abs/2002.06440
[15] X. Yao, T. Huang, C. Wu, R. Zhang, L. Sun, "Towards faster and better federated learning: a feature fusion approach," pp. 175-179, DOI: 10.1109/icip.2019.8803001, .
[16] A. Ghosh, J. Chung, Y. Dong, R. Kannan, "An efficient framework for clustered federated learning," 2020. https://arxiv.org/abs/2006.04088
[17] K. Kopparapu, E. Lin, J. Zhao, "Fedcd: Improving performance in non-iid federated learning," 2020. https://arxiv.org/abs/2006.09637
[18] B. Keith, H. Eichner, W. Grieskamp, D. Huba, A. Ingerman, V. Ivanov, C. Kiddon, J. Konečný, S. Mazzocchi, H. Brendan McMahan, T. Van Overveldt, D. Petrou, D. Ramage, J. Roselander, "Towards federated learning at scale: system design," 2019. https://arxiv.org/abs/1902.01046
[19] M. Xie, G. Long, T. Shen, T. Zhou, X. Wang, J. Jiang, C. Zhang, "Multi-center federated learning," 2020. https://arxiv.org/abs/2005.01026
[20] C. Zheng, A. Ali, Z. Syed, S. Truex, A. Anwar, N. Baracaldo, Y. Zhou, H. Ludwig, F. Yan, Y. Cheng, "Tifl: a tier-based federated learning system," Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing, pp. 125-136, .
[21] A. Likas, N. Vlassis, J. Verbeek, "The global k-means clustering algorithm," Pattern Recognition, vol. 36 no. 2, pp. 451-461, DOI: 10.1016/s0031-3203(02)00060-2, 2003.
[22] C. Briggs, Z. Fan, A. Peter, "Federated learning with hierarchical clustering of local updates to improve training on non-iid data," ,DOI: 10.1109/ijcnn48605.2020.9207469, .
[23] D. Birant, A. Kut, "ST-DBSCAN: an algorithm for clustering spatial-temporal data," Data & Knowledge Engineering, vol. 60 no. 1, pp. 208-221, DOI: 10.1016/j.datak.2006.01.013, 2007.
[24] J.-Y. Kim, S.-B. Cho, "Evolutionary optimization of hyperparameters in deep learning models," pp. 831-837, DOI: 10.1109/cec.2019.8790354, .
[25] I. Aljarah, H. Faris, S. Mirjalili, "Optimizing connection weights in neural networks using the whale optimization algorithm," Soft Computing, vol. 22 no. 1,DOI: 10.1007/s00500-016-2442-1, 2018.
[26] X. Xiao, M. Yan, S. Basodi, C. Ji, Y. Pan, "Efficient hyperparameter optimization in deep learning using a variable length genetic algorithm," 2020. https://arxiv.org/abs/2006.12703
[27] G. Beruvides, R. Quiza, M. Rivas, F. Castaño, R. E. Haber, "Online detection of run out in microdrilling of tungsten and titanium alloys," International Journal of Advanced Manufacturing Technology, vol. 74 no. 9–12, pp. 1567-1575, 2014.
[28] G. Beruvides, C. Juanes, F. Castaño, R. E. Haber, "A self-learning strategy for artificial cognitive control systems," pp. 1180-1185, DOI: 10.1109/indin.2015.7281903, .
[29] Li Deng, "The MNIST database of handwritten digit images for machine learning research [best of the web]," IEEE Signal Processing Magazine, vol. 29 no. 6, pp. 141-142, DOI: 10.1109/msp.2012.2211477, 2012.
[30] A. Krizhevsky, G. Hinton, "Learning multiple layers of features from tiny images," 2009.
[31] F. Sattler, Klaus-Robert Müller, T. Wiegand, W. Samek, "On the byzantine robustness of clustered federated learning," pp. 8861-8865, DOI: 10.1109/icassp40776.2020.9054676, .
[32] A. Malekijoo, M. J. Fadaeieslam, H. Malekijou, M. Homayounfar, F. Alizadeh-Shabdiz, R. Rawassizadeh, "Fedzip: a compression framework for communication-efficient federated learning," , 2021. https://arxiv.org/abs/2102.01593
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Copyright © 2021 Shaashwat Agrawal et al. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0/
Abstract
Federated learning (FL) is a distributed model for deep learning that integrates client-server architecture, edge computing, and real-time intelligence. FL has the capability of revolutionizing machine learning (ML) but lacks in the practicality of implementation due to technological limitations, communication overhead, non-IID (independent and identically distributed) data, and privacy concerns. Training a ML model over heterogeneous non-IID data highly degrades the convergence rate and performance. The existing traditional and clustered FL algorithms exhibit two main limitations, including inefficient client training and static hyperparameter utilization. To overcome these limitations, we propose a novel hybrid algorithm, namely, genetic clustered FL (Genetic CFL), that clusters edge devices based on the training hyperparameters and genetically modifies the parameters clusterwise. Then, we introduce an algorithm that drastically increases the individual cluster accuracy by integrating the density-based clustering and genetic hyperparameter optimization. The results are bench-marked using MNIST handwritten digit dataset and the CIFAR-10 dataset. The proposed genetic CFL shows significant improvements and works well with realistic cases of non-IID and ambiguous data. An accuracy of 99.79% is observed in the MNIST dataset and 76.88% in CIFAR-10 dataset with only 10 training rounds.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details






1 School of Computer Science and Engineering, Vellore Institute of Technology, Vellore 632014, India
2 College of Engineering, IT and Environment, Charles Darwin University, Casuarina 0909, NT, Australia
3 School of Information Technology and Engineering, Vellore Institute of Technology, Vellore 632014, India
4 Korean Southeast Center for the 4th Industrial Revolution Leader Education, Pusan National University, Busan 46241, Republic of Korea