Content area

Abstract

Collaborative Learning (CL) is a decentralized machine learning framework that enables multiple clients to collaboratively solve tasks without sharing their raw data. At its early stage, the most prominent form of CL was Federated Learning (FL), which was originally introduced to leverage distributed information across clients for training a global model, with data heterogeneity and high communication costs being the primary concerns. Recently, with the rapid development of Pre-Trained Models (PTMs), there is a growing need for federated fine-tuning to efficiently adapt PTMs to downstream tasks using distributed, task-oriented datasets. However, since PTMs often encapsulate substantial proprietary knowledge, model privacy has emerged as an additional critical concern alongside data privacy. Nowadays, advancements in computational and storage capabilities have made it increasingly feasible to deploy PTMs on edge devices. In scenarios involving complex tasks that demand the integration of diverse capabilities, a pressing research challenge is how to effectively coordinate heterogeneous clients equipped with specialized PTMs to enable collaborative problem solving.

Our first work, FedDAD, considers the problem of unsupervised deep anomaly detection (DAD) in an FL setting with noisy and heterogeneous data. It leverages a small public dataset on the server as a shared normal anchor in the latent space to relieve the data heterogeneity problem, improving anomaly identification capability across clients. When combined with PTMs, our second work GenFFT introduces a hybrid sharing mechanism that combines parameter sharing and knowledge sharing to protect model privacy. GenFFT suggests using a lightweight substitute model, rather than sharing the entire PTMs, during the training process, together with the generation modules that are alternatively updated by the server and clients to promote information exchange. When clients possess private models with distinct capabilities, complex tasks can be solved through model collaboration without further parameter updates, which requires the server to generate a plan that effectively coordinates their cooperation. As the server always fails to generate the optimal plan on the first attempt, we propose COP, a novel client-oriented planning framework that refines the initial plan before execution based on three specifically designed principles: solvability, completeness, and non-redundancy, thus enables the collaborative resolution of complex tasks while preserving both data and model privacy.

Extensive experiments across a variety of datasets demonstrate that our proposed methods are broadly effective: whether federated training of small models from scratch, federated fine-tuning of large pre-trained models, or collaborative inference without parameter updating, each approach achieves strong performance while preserving data privacy across diverse tasks.

Details

1010268
Business indexing term
Title
Tackling Data and Resource Heterogeneity for Performance Enhancement of Collaborative Learning Systems
Author
Number of pages
105
Publication year
2025
Degree date
2025
School code
1223
Source
DAI-A 87/5(E), Dissertation Abstracts International
ISBN
9798263313180
University/institution
Hong Kong University of Science and Technology (Hong Kong)
University location
Hong Kong
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
32407350
ProQuest document ID
3273626409
Document URL
https://www.proquest.com/dissertations-theses/tackling-data-resource-heterogeneity-performance/docview/3273626409/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic