Generating Medical Instructions with Conditional Transformer

Published: 30 Oct 2023, Last Modified: 30 Nov 2023SyntheticData4ML 2023 PosterEveryoneRevisionsBibTeX
Keywords: Synthetic data for ClinicalNLP, Clinical NER, Transformers, Medical Instructions
TL;DR: We propose a new model label-to-text-Transformer (LT3) to generate clinical synthetic data and test it on the data diversity and usefulness for downstream application NER task.
Abstract: Access to real-world medical instructions is essential for medical research and healthcare quality improvement. However, access to real medical instructions is often limited due to the sensitive nature of the information expressed. Additionally, manually labelling these instructions for training and fine-tuning Natural Language Processing (NLP) models can be tedious and expensive. We introduce a novel task-specific model architecture, Label-To-Text-Transformer (LT3), tailored to generate synthetic medical instructions based on provided labels, such as a vocabulary list of medications and their attributes. LT3 is trained on a vast corpus of medical instructions extracted from the MIMIC-III database, allowing the model to produce valuable synthetic medical instructions. We evaluate LT3's performance by contrasting it with a state-of-the-art Pre-trained Language Model (PLM), T5, analysing the quality and diversity of generated texts. We deploy the generated synthetic data to train the SpacyNER model for the Named Entity Recognition (NER) task over the n2c2-2018 dataset. The experiments show that the model trained on synthetic data can achieve a 96-98\% F1 score at Label Recognition on Drug, Frequency, Route, Strength, and Form. LT3 codes will be shared at \url{}
Submission Number: 61