Content area

Abstract

Natural Language Processing models have gained significant attention due to the development of large-scale models such as OpenAI’s GPT. These models rely on extensive and diverse datasets, which presents data-sharing challenges such as privacy and ownership.

Federated Learning addresses those challenges by allowing multiple actors to collaboratively train a shared model without exchanging the raw data. Not sharing the raw data enables the use of data that, in normal conditions, could not be shared due to privacy, ethical or legal concerns. This decentralised approach minimises the central storage requirements while also lessening data privacy risks.

Statistical learning algorithms are commonly used in Federated Learning. However, they must be adapted into distributed statistical learning algorithms in order to handle decentralised data. These distributed algorithms are being developed and, therefore, must obtain empirical results to assess their theoretical foundations. Due to the distributed nature of the algorithms, performing an empirical evaluation is a complex task, as the environments these algorithms operate in, and consequently, the adversities they encounter are difficult to replicate physically and consistently. 

This dissertation aims to support the development, improvement and analysis of distributed statistical learning algorithms by introducing an evaluation framework implemented as a discrete event simulator. The existent discrete-event simulators are compared and analysed with the evaluation of the target algorithms in mind. Then, a simulator is designed and purpose-built to be extensible, configurable and observable.

The developed simulator is validated by comparing its functioning to that of an already established simulator, and its metrics visualisation capabilities are demonstrated. Furthermore, the simulator is used to evaluate a distributed statistical learning algorithm. Based on the evaluation results, a solution is proposed to address the algorithm’s identified functional shortcomings. The proposed solution is also evaluated using the designed simulator, and its results are compared to those of the original implementation. 

Details

1010268
Title
Evaluation of Distributed Statistical Learning
Number of pages
210
Publication year
2025
Degree date
2025
School code
5896
Source
MAI 87/6(E), Masters Abstracts International
ISBN
9798270220884
Committee member
Silva, João Marco
University/institution
Universidade do Porto (Portugal)
University location
Portugal
Degree
Master's
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
32426828
ProQuest document ID
3287857473
Document URL
https://www.proquest.com/dissertations-theses/evaluation-distributed-statistical-learning/docview/3287857473/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic