Privacy-Preserving Machine Learning and Data Sharing in Healthcare Applications
Original version
Aminifar, A. (2022). Privacy-preserving machine learning and data sharing in healthcare applications [Doctoral dissertation]. Western Norway University of Applied SciencesAbstract
Artificial intelligence (AI) and automated decision-making have the potential to improve accuracy and efficiency in healthcare applications. In particular, AI is proved to outperform human experts in certain domains. However, the application of AI and machine learning for automated decision-making in healthcare comes with challenges, such as security and privacy preservation. Such issues are among the primary concerns that must be addressed as they may negatively affect individuals. For instance, a patient’s privacy is violated if sharing his/her medical data with a third-party data recipient reveals that he/she had a medical condition. Furthermore, particular guidelines, e.g., General Data Protection Regulation (GDPR), are proposed to legally protect the privacy of patients that has to be observed while employing AI and machine learning in this domain.
In order to address such privacy concerns, in this thesis, we consider two principal directions for the analysis of data and concentrate our research on them. In one primary direction, the analysis is performed on the published/shared data. Therefore, the data holder needs to consider particular measures to protect the privacy of data subjects, for instance, by perturbing the data before publishing. In this thesis, along this direction, we propose an anonymization framework, formulated as an optimization problem, for datasets with both categorical and numerical attributes. The proposed framework is based on clustering the data samples by considering the diversity issue in anonymization to reduce the risks of identity and attribute linkage attacks. Our method achieves anonymity by formulating and solving this problem as a constrained optimization problem, by jointly considering the k-anonymity, l-diversity, and t-closeness privacy models. We evaluate our framework on popular publicly available structured healthcare data.
The other primary direction is to perform analysis without publishing the data. In such settings, we consider multiple parties, each of which holds a different part of the data. The objective is to analyze the data held on these parties without direct access to the data record values. In this thesis, along this direction, we present a scalable privacypreserving distributed learning framework based on the Extremely Randomized Trees (ERT) algorithm and Secure Multiparty Computation (SMC) techniques. We build a machine learning model based on the entire dataset by analyzing the data locally at each party and combining the results of this analysis. We evaluate the distributed implementation of our technique based on healthcare datasets collected in the INTROMAT project and demonstrate its prediction performance.
In summary, the research in this thesis contributes to the possibility of exploiting health data in the healthcare setting for analysis and automatic decision-making without privacy violation. This has a long-term potential for better decision-making in the healthcare context, diagnosis, and treatment, at an affordable cost.
Description
In reference to IEEE copyrighted material which is used with permission in this thesis, the IEEE does not endorse any of Western Norway University of Applied Sciences's products or services. Internal or personal use of this material is permitted. If interested in reprinting/republishing IEEE copyrighted material for advertising or promotional purposes or for creating new collective works for resale or redistribution, please go to http://www.ieee.org/publications_standards/publications/rights/rights_link.html to learn how to obtain a License from RightsLink.
Has parts
Aminifar, A., Lamo, Y., Pun, K. I., & Rabbi, F. (2019). A practical methodology for anonymization of structured health data. In C. Granja & T. Solvoll (Eds.), SHI 2019: Proceedings of the 17th Scandinavian Conference on Health Informatics November 12-13, 2019 Oslo, Norway. Linköping University Electronic Press.Aminifar, A., Rabbi, F., Pun, V. K. I., & Lamo, Y. (2021). Diversity-aware anonymization for structured health data. In 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society, EMBC 2021, Mexico, November 1-5,2021. IEEE. https://doi.org/10.1109/EMBC46164.2021.9629918
Aminifar, A., Rabbi, F., Pun, K. I., & Lamo, Y. (2021). Privacy preserving distributed extremely randomized trees. In Proceedings of the 36th Annual ACM Symposium on Applied Computing (pp. 1102-1105). https://doi.org/10.1145/3412841.3442110
Aminifar, A., Rabbi, F. & Lamo, Y. (2021). Scalable privacy-preserving distributed extremely randomized trees for structured data with multiple colluding parties. I D. Androutsos, K. Plataniotis & X.-P. Zhang (Red.), ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing). IEEE. https://doi.org/10.1109/ICASSP39728.2021.9413632
Aminifar, A., Rabbi, F., Pun, V. K. I. & Lamo, Y. (2021). Monitoring motor activity data for detecting patients' depression using data augmentation and privacy-preserving distributed learning. I 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society, EMBC 2021, Mexico, November 1-5,2021 (Proceedings of the Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC)). IEEE. https://doi.org/10.1109/EMBC46164.2021.9630592
Aminifar, A., Shokri, M., Rabbi, F., Pun, V. K. I. & Lamo, Y. (2022). Extremely randomized trees with privacy preservation for distributed structured health data. IEEE Access, 10, 6010-6027. https://doi.org/10.1109/access.2022.3141709