ClinX: Multimodal De-Identification for Robust and Bias-Resilient Medical VLMMs

ACL ARR 2026 January Submission4008 Authors

04 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multimodal NLP, Vision and Language, Natural Language Processing for Healthcare, Privacy and Security in NLP, Ethics and Bias in AI
Abstract: Visual large multimodal models (VLMMs) are increasingly being adopted for medical applications such as VQA in the clinical domain, but the use of such models on sensitive medical data raises significant concerns about privacy and generalization. To overcome the challenges associated with the use of such sensitive medical information, we present ClinX, a privacy-preserving medical multimodal inference system that de-identifies data while remaining Health Insurance Portability and Accountability Act (HIPAA) Safe Harbor compliant without compromising model diagnostic performance. The image de-identification component of the system uses a custom SPADE-based generative adversarial network (PP-GAN) approach to perform image redaction by inpainting regions marked by the OCR engine as PHI (e.g., names, timestamps) followed by lightweight mask aware postprocessing. The text de-identification component employs a three-step approach involving rule-based redaction, named entity recognition, and neural rewriting. We present experimental results using two medical multimodal VQA datasets, VQA-RAD and PathVQA, using a medical VLMM model, LLaVA-Med, and a large-scale general-purpose model, Llama, as a semantic judge. Experimental results show strong preservation of exact-match and semantic accuracy after de-identification. Furthermore, the proactive removal of textual overlays mitigates dataset-specific bias, in some cases even enhancing robustness by eliminating spurious textual shortcuts. These results validate ClinX as a practical and secure solution for privacy-conscious medical AI deployment.
Paper Type: Long
Research Area: Clinical and Biomedical Applications
Research Area Keywords: Clinical NLP, Biomedical NLP, Visual Question Answering, Vision and Language, Privacy, Bias, Ethics
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 4008
Loading