Content area
Abstract
To utilize enormous data generated on numerous edge devices such as mobile phones to train a high-performance machine learning model while protecting users' data privacy, federated learning has been proposed and has become one of the most essential paradigms in distributed machine learning. Under the coordination of a central server, users collaboratively train a shared global model without sharing their data. They conduct local training with their data and only send their model updates to the central server for aggregating an improved global model.
To improve training performance in federated learning, a myriad of new mechanisms have been proposed. However, our extensive evaluation indicated that most state-of-the-art mechanisms failed to perform as well as they claimed. We thus explored potential directions that can consistently decrease the elapsed training time for the global model to converge to a target accuracy, and found that changing the conventional synchronous aggregation paradigm, where the server does not conduct aggregation until it receives updates from all the selected users, to the asynchronous one, where the server aggregates without waiting for slow users, significantly improved the training performance.
However, asynchronous federated learning has not been as widely studied as synchronous federated learning, and its existing mechanisms have not reached the best possible performance. We thus propose Blade, a new staleness-aware framework that seeks to improve the performance of asynchronous federated learning by designing new mechanisms in all important design aspects of the training process. In an extensive array of performance evaluations, Blade consistently showed its substantial performance superiority over its state-of-the-art competitors.
Under some scenarios of federated learning, users are institutions such as hospitals and banks, which implicitly require centrally storing data of their clients to conduct local training. To protect clients' data privacy, these scenarios motivated us to study three-layer federated learning, where institutions serve as edge servers on the middle layer, between the central server and clients. With empirical and theoretical studies, we observe that pruning and quantization could largely reduce communication overhead with a negligible reduction, sometimes even a slight increase, in training performance. Also, the number of clients' local training epochs affects the training performance. We thus propose two new mechanisms, FedSaw which prunes and quantizes updates, and Tempo which adaptively tunes the number of each client's local training epochs, to improve training performance in three-layer federated learning.
Inspired by the advantages of using asynchronous aggregation in two-layer federated learning, we investigate asynchronous three-layer federated learning which also demonstrates its superiority over synchronous three-layer federated learning in our empirical study. We thus modify Blade to adapt it to the three-layer setting. Experimental results show that Blade can also significantly improve training performance in three-layer federated learning.