Bridging Gaps with Multimodal Data: A Comprehensive Dataset for Pharmacovigilance Analysis in Ovarian Cancer

ACL ARR 2024 June Submission4050 Authors

16 Jun 2024 (modified: 02 Aug 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Ovarian cancer is a highly fatal type of gynecologic cancer, with over 70\% of cases diagnosed at an advanced stage due to mild and nonspecific symptoms. This delayed diagnosis involves intensive treatments, such as surgery and chemotherapy. These treatments widely use platinum-based compounds and taxanes, which are highly effective but can cause serious adverse reactions. Identifying adverse drug reactions (ADRs) efficiently is essential in managing these side effects and ensuring that patients receive the most effective and safest medical care possible. In this work, we present $\textit{OvaCer}$, a novel multi-labelled multimodal dataset thoroughly developed for ovarian cancer pharmacovigilance. This dataset includes 1500 records containing vital details such as drug name, duration of drug use, adverse effects, severity levels, post-effect actions, and reference images used during ovarian cancer treatment. In order to further enhance its adaptability for pharmacovigilance objectives, we have incorporated gold-standard summaries of patient experiences. Recognizing the potential of large language models (LLMs) in summarization, we conducted a comprehensive evaluation of several pre-trained models, including GPT-3.5, T5, BART, FlanT5, and clinical models like PMC LLaMA in medical summarization. Our results show that LLMs demonstrate varying degrees of effectiveness in clinical summarization tasks, with GPT-3.5 significantly outperforming other models.
Paper Type: Short
Research Area: Resources and Evaluation
Research Area Keywords: Resources and Evaluation, Interpretability and Analysis of Models for NLP, NLP Applications
Contribution Types: Model analysis & interpretability, Data resources
Languages Studied: English
Submission Number: 4050