Privacy-Preserving Machine Learning and Data Sharing in Healthcare Applications

Aminifar, Amin

dc.contributor.author	Aminifar, Amin
dc.date.accessioned	2022-04-04T09:21:35Z
dc.date.available	2022-04-04T09:21:35Z
dc.date.created	2022-03-26T17:15:42Z
dc.date.issued	2022
dc.identifier.citation	Aminifar, A. (2022). Privacy-preserving machine learning and data sharing in healthcare applications [Doctoral dissertation]. Western Norway University of Applied Sciences	en_US
dc.identifier.isbn	978-82-93677-96-3
dc.identifier.uri	https://hdl.handle.net/11250/2989477
dc.description	In reference to IEEE copyrighted material which is used with permission in this thesis, the IEEE does not endorse any of Western Norway University of Applied Sciences's products or services. Internal or personal use of this material is permitted. If interested in reprinting/republishing IEEE copyrighted material for advertising or promotional purposes or for creating new collective works for resale or redistribution, please go to http://www.ieee.org/publications_standards/publications/rights/rights_link.html to learn how to obtain a License from RightsLink.	en_US
dc.description.abstract	Artificial intelligence (AI) and automated decision-making have the potential to improve accuracy and efficiency in healthcare applications. In particular, AI is proved to outperform human experts in certain domains. However, the application of AI and machine learning for automated decision-making in healthcare comes with challenges, such as security and privacy preservation. Such issues are among the primary concerns that must be addressed as they may negatively affect individuals. For instance, a patient’s privacy is violated if sharing his/her medical data with a third-party data recipient reveals that he/she had a medical condition. Furthermore, particular guidelines, e.g., General Data Protection Regulation (GDPR), are proposed to legally protect the privacy of patients that has to be observed while employing AI and machine learning in this domain. In order to address such privacy concerns, in this thesis, we consider two principal directions for the analysis of data and concentrate our research on them. In one primary direction, the analysis is performed on the published/shared data. Therefore, the data holder needs to consider particular measures to protect the privacy of data subjects, for instance, by perturbing the data before publishing. In this thesis, along this direction, we propose an anonymization framework, formulated as an optimization problem, for datasets with both categorical and numerical attributes. The proposed framework is based on clustering the data samples by considering the diversity issue in anonymization to reduce the risks of identity and attribute linkage attacks. Our method achieves anonymity by formulating and solving this problem as a constrained optimization problem, by jointly considering the k-anonymity, l-diversity, and t-closeness privacy models. We evaluate our framework on popular publicly available structured healthcare data. The other primary direction is to perform analysis without publishing the data. In such settings, we consider multiple parties, each of which holds a different part of the data. The objective is to analyze the data held on these parties without direct access to the data record values. In this thesis, along this direction, we present a scalable privacypreserving distributed learning framework based on the Extremely Randomized Trees (ERT) algorithm and Secure Multiparty Computation (SMC) techniques. We build a machine learning model based on the entire dataset by analyzing the data locally at each party and combining the results of this analysis. We evaluate the distributed implementation of our technique based on healthcare datasets collected in the INTROMAT project and demonstrate its prediction performance. In summary, the research in this thesis contributes to the possibility of exploiting health data in the healthcare setting for analysis and automatic decision-making without privacy violation. This has a long-term potential for better decision-making in the healthcare context, diagnosis, and treatment, at an affordable cost.	en_US
dc.language.iso	eng	en_US
dc.publisher	Høgskulen på Vestlandet	en_US
dc.relation.haspart	Aminifar, A., Lamo, Y., Pun, K. I., & Rabbi, F. (2019). A practical methodology for anonymization of structured health data. In C. Granja & T. Solvoll (Eds.), SHI 2019: Proceedings of the 17th Scandinavian Conference on Health Informatics November 12-13, 2019 Oslo, Norway. Linköping University Electronic Press.	en_US
dc.relation.haspart	Aminifar, A., Rabbi, F., Pun, V. K. I., & Lamo, Y. (2021). Diversity-aware anonymization for structured health data. In 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society, EMBC 2021, Mexico, November 1-5,2021. IEEE. https://doi.org/10.1109/EMBC46164.2021.9629918	en_US
dc.relation.haspart	Aminifar, A., Rabbi, F., Pun, K. I., & Lamo, Y. (2021). Privacy preserving distributed extremely randomized trees. In Proceedings of the 36th Annual ACM Symposium on Applied Computing (pp. 1102-1105). https://doi.org/10.1145/3412841.3442110	en_US
dc.relation.haspart	Aminifar, A., Rabbi, F. & Lamo, Y. (2021). Scalable privacy-preserving distributed extremely randomized trees for structured data with multiple colluding parties. I D. Androutsos, K. Plataniotis & X.-P. Zhang (Red.), ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing). IEEE. https://doi.org/10.1109/ICASSP39728.2021.9413632	en_US
dc.relation.haspart	Aminifar, A., Rabbi, F., Pun, V. K. I. & Lamo, Y. (2021). Monitoring motor activity data for detecting patients' depression using data augmentation and privacy-preserving distributed learning. I 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society, EMBC 2021, Mexico, November 1-5,2021 (Proceedings of the Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC)). IEEE. https://doi.org/10.1109/EMBC46164.2021.9630592	en_US
dc.relation.haspart	Aminifar, A., Shokri, M., Rabbi, F., Pun, V. K. I. & Lamo, Y. (2022). Extremely randomized trees with privacy preservation for distributed structured health data. IEEE Access, 10, 6010-6027. https://doi.org/10.1109/access.2022.3141709	en_US
dc.title	Privacy-Preserving Machine Learning and Data Sharing in Healthcare Applications	en_US
dc.type	Doctoral thesis	en_US
dc.description.version	publishedVersion	en_US
dc.rights.holder	©Amin Aminifar, 2022	en_US
dc.source.pagenumber	180	en_US
dc.identifier.cristin	2012745
cristin.ispublished	false
cristin.fulltext	original

Tilhørende fil(er)

Filnavn:: Aminifar.pdf
Størrelse:: 15.95Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Import fra CRIStin [3604]
Institutt for datateknologi, elektroteknologi og realfag [1163]

Vis enkel innførsel