[Tiny] Synthetic-based retrieval of patient medical data

Published: 04 Mar 2025, Last Modified: 17 Apr 2025ICLR 2025 Workshop SynthDataEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Information Retrieval, synthetic enrichment, textual medical data
TL;DR: We used LLM to generate synthetic queries to doctor text reports. Based on this data, we fine-tuned an encoder and obtained improvements in retrieval tasks.
Abstract: Medical retrieval systems play a crucial role in facilitating an accurate and efficient diagnosis by allowing physicians to access relevant radiological reports and patient descriptions. However, the development of such systems is often hindered by the limited availability of high-quality labeled data due to privacy concerns and data scarcity. In this work, we propose an approach to address this challenge by using synthetic data generation using Large Language Models (LLMs). Our experiments show that synthetic data is useful for improving retrieval performance in various tasks, both in training modes entirely on synthetic data and in a mixed-with-real-data mode.
Submission Number: 70
Loading