Macquarie University
Browse

Privacy-Preserving Natural Language Processing Techniques in Healthcare Chatbots

Download (3.14 MB)
thesis
posted on 2025-09-10, 04:12 authored by Graeme Roger Thomas
<p dir="ltr">This study investigates privacy-preserving techniques and secure communication protocols for safeguarding user data in healthcare chatbots. With the increasing deployment of AI-driven chatbots for medical support, concerns regarding the privacy and security of sensitive patient data have escalated. The objectives guiding this study include i) determining how encryption and anonymisation techniques affect NLP performance; ii) identifying the vulnerabilities in communication channels used by NLP chatbots; iii) to determine the best combination of privacy and communication techniques that offers the best protection for chatbots; iv) to determine the effect of privacy-preserving techniques on user experience; and v) the regulations and ethics surrounding the use and implementation of privacy preservation techniques. The study employs a mixed-methods approach, combining quantitative and qualitative analyses to assess privacy-preserving NLP techniques and their implications for healthcare chatbot security. This method enables a more nuanced exploration of the research problem, offering both empirical data and contextual insights. The findings reveal that AES, a symmetric encryption method, consistently outperformed RSA, an asymmetric method, in terms of speed, efficiency, and impact on chatbot response times. AES demonstrated significantly lower encryption times compared to RSA with minimal computational overhead, enabling near-instantaneous responses even with stronger encryption keys. The study also evaluates the effectiveness of anonymisation techniques. Before anonymisation, the dataset exhibited a high privacy leakage rate (PLR) of 92%, with 100% Identifiable Data Residuals (IDR) and an 85% re-identification risk (RR), underscoring the significant exposure of sensitive information. After applying tokenisation, redaction, and data masking, privacy leakage rates were substantially reduced: tokenisation achieved an 8% PLR, redaction reduced PLR to 4%, and data masking resulted in a 14% PLR. Redaction was the most effective among these methods, eliminating identifiable data with a low re-identification risk of 7%. Several policy implications are also discussed, providing guidelines for regulatory authorities to improve the privacy protection of healthcare chatbot applications. Considering these considerations and limitations, the present study provides a valuable first step toward developing private chatbots that protect user privacy by augmenting existing work, datasets, and resources. It can also be extended to any application that demands privacy preservation. Possible research avenues are proposed, such as analysing quantum/invertible encryption, synthesising data, and considering privacy models for other high-sensitivity areas, including finance and education.</p>

History

Table of Contents

1. Introduction -- 2. Literature Review -- 3. Theoretical Framework -- 4. Methodology -- 5. Analysis And Findings -- 6. Discussion -- 7. Conclusion -- 8. References

Awarding Institution

Macquarie University

Degree Type

Thesis masters research

Degree

Master of Philosophy

Department, Centre or School

School of Computing

Year of Award

2025

Principal Supervisor

Amin Beheshti

Additional Supervisor 1

Alireza Jolfaei

Rights

Copyright: The Author Copyright disclaimer: https://www.mq.edu.au/copyright-disclaimer

Language

English

Extent

283 pages

Former Identifiers

AMIS ID: 480190