Content area

Abstract

The field of neuroimaging can greatly benefit from building machine learning models to detect and predict diseases, and discover novel biomarkers, but much of the data collected at various organizations and research centers is unable to be shared due to privacy or regulatory concerns (especially for clinical data or rare disorders). In addition, aggregating data across multiple large studies results in a huge amount of duplicated technical debt and the resources required can be challenging or impossible for an individual site to build. Training on the data distributed across organizations can result in models that generalize much better than models trained on data from any of organizations alone. While there are approaches for decentralized sharing, these often do not provide the highest possible guarantees of sample privacy that only cryptography can provide. In addition, such approaches are often focused on probabilistic solutions. In this paper, we propose an approach that leverages the potential of datasets spread among a number of data collecting organizations by performing joint analyses in a secure and deterministic manner when only encrypted data is shared and manipulated. The approach is based on secure multiparty computation which refers to cryptographic protocols that enable distributed computation of a function over distributed inputs without revealing additional information about the inputs. It enables multiple organizations to train machine learning models on their joint data and apply the trained models to encrypted data without revealing their sensitive data to the other parties. In our proposed approach, organizations (or sites) securely collaborate to build a machine learning model as it would have been trained on the aggregated data of all the organizations combined. Importantly, the approach does not require a trusted party (i.e. aggregator), each contributing site plays an equal role in the process, and no site can learn individual data of any other site. We demonstrate effectiveness of the proposed approach, in a range of empirical evaluations using different machine learning algorithms including logistic regression and convolutional neural network models on human structural and functional magnetic resonance imaging datasets.

Details

Title
NeuroCrypt: Machine Learning Over Encrypted Distributed Neuroimaging Data
Author
Senanayake, Nipuna 1   VIAFID ORCID Logo  ; Podschwadt, Robert 1 ; Takabi, Daniel 1 ; Calhoun, Vince D. 1 ; Plis, Sergey M. 1 

 Georgia State University, Atlanta, USA (GRID:grid.256304.6) (ISNI:0000 0004 1936 7400) 
Pages
91-108
Publication year
2022
Publication date
Jan 2022
Publisher
Springer Nature B.V.
ISSN
15392791
e-ISSN
15590089
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2721999482
Copyright
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2021.