In the healthcare software development domain, ensuring the privacy and security of sensitive patient data is crucial, especially when sharing databases for testing, auditing, debugging, or development purposes. The Health Insurance Portability and Accountability Act (HIPAA) imposes strict regulations on the handling and sharing of Protected Health Information (PHI), making secure database sharing a challenging and labor-intensive task. Traditional data masking methods rely on manual processes. These are not only time-consuming but also prone to errors, particularly when new tables or columns are introduced in the existing database. This paper presents a novel approach that utilizes Large Language Model (LLM) to automatically identify sensitive data fields that need masking or encryption before sharing. A comparative analysis of four different LLMs and three classical MLs are conducted using a custom dataset and among the tested models, a fine-tuned BERT model achieved the highest accuracy. The proposed method is also implemented as a web-based application, providing database administrators with a schema-first and AI-driven recommendations for masking sensitive data and securely sharing databases in compliance with HIPAA. This approach minimizes human error, improves scalability, and strengthens data security, ultimately facilitating secure and compliant collaborative software development in the healthcare sector.
Related links
Details
Title
Secure Database Sharing in Healthcare
Publication Details
Proceedings 2025 IEEE International Conference on Big Data (BigData), pp.4217-4226
Resource Type
Conference proceeding
Conference
2025 IEEE International Conference on Big Data (IEEE BigData 2025) (Macau, China, 12/07/2025–12/10/2025)