Tackling Data and Resource Heterogeneity for Performance Enhancement of Collaborative Learning Systems

Abstract

Collaborative Learning (CL) is a decentralized machine learning framework that enables multiple clients to collaboratively solve tasks without sharing their raw data. At its early stage, the most prominent form of CL was Federated Learning (FL), which was originally introduced to leverage distributed information across clients for training a global model, with data heterogeneity and high communication costs being the primary concerns. Recently, with the rapid development of Pre-Trained Models (PTMs), there is a growing need for federated fine-tuning to efficiently adapt PTMs to downstream tasks using distributed, task-oriented datasets. However, since PTMs often encapsulate substantial proprietary knowledge, model privacy has emerged as an additional critical concern alongside data privacy. Nowadays, advancements in computational and storage capabilities have made it increasingly feasible to deploy PTMs on edge devices. In scenarios involving complex tasks that demand the integration of diverse capabilities, a pressing research challenge is how to effectively coordinate heterogeneous clients equipped with specialized PTMs to enable collaborative problem solving.

Our first work, FedDAD, considers the problem of unsupervised deep anomaly detection (DAD) in an FL setting with noisy and heterogeneous data. It leverages a small public dataset on the server as a shared normal anchor in the latent space to relieve the data heterogeneity problem, improving anomaly identification capability across clients. When combined with PTMs, our second work GenFFT introduces a hybrid sharing mechanism that combines parameter sharing and knowledge sharing to protect model privacy. GenFFT suggests using a lightweight substitute model, rather than sharing the entire PTMs, during the training process, together with the generation modules that are alternatively updated by the server and clients to promote information exchange. When clients possess private models with distinct capabilities, complex tasks can be solved through model collaboration without further parameter updates, which requires the server to generate a plan that effectively coordinates their cooperation. As the server always fails to generate the optimal plan on the first attempt, we propose COP, a novel client-oriented planning framework that refines the initial plan before execution based on three specifically designed principles: solvability, completeness, and non-redundancy, thus enables the collaborative resolution of complex tasks while preserving both data and model privacy.

Extensive experiments across a variety of datasets demonstrate that our proposed methods are broadly effective: whether federated training of small models from scratch, federated fine-tuning of large pre-trained models, or collaborative inference without parameter updating, each approach achieves strong performance while preserving data privacy across diverse tasks.

Details

Business indexing term

Subject:

Machine learning;
Artificial intelligence;
Costs

Subject

Machine learning;
Deep learning;
Artificial intelligence;
Costs;
Communication;
Planning;
Sensors;
Neural networks;
Decomposition;
Design;
Data science;
Privacy;
Collaborative learning;
Energy consumption;
Natural language

Classification

0389: Design
0800: Artificial intelligence
0459: Communication

Identifier / keyword

Collaborative Learning; Federated Learning

URL

https://doi.org/10.14711/thesis-hdl152577

Title

Tackling Data and Resource Heterogeneity for Performance Enhancement of Collaborative Learning Systems

Author

Li, Ao

Number of pages

105

Publication year

2025

Degree date

2025

School code

1223

Source

DAI-A 87/5(E), Dissertation Abstracts International

ISBN

9798263313180

Advisor

Tsung, Fugee; Li, Songze

University/institution

Hong Kong University of Science and Technology (Hong Kong)

University location

Hong Kong

Degree

Ph.D.

Source type

Dissertation or Thesis

Language

English

Document type

Dissertation/Thesis

Dissertation/thesis number

32407350

ProQuest document ID

3273626409

Document URL

https://www.proquest.com/dissertations-theses/tackling-data-resource-heterogeneity-performance/docview/3273626409/se-2?accountid=208611

Go to Library Record

https://doi.org/10.14711/thesis-hdl152577

Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.

Database

ProQuest One Academic

Tackling Data and Resource Heterogeneity for Performance Enhancement of Collaborative Learning Systems

Content area

Abstract

Details