Hybrid Modeling of Serious Vaccine Adverse Events Using Narrative Embeddings and Structured Data

Published: 19 Aug 2025, Last Modified: 12 Oct 2025BHI 2025EveryoneRevisionsBibTeXCC BY 4.0
Confirmation: I have read and agree with the IEEE BHI 2025 conference submission's policy on behalf of myself and my co-authors.
Keywords: Vaccine safety, NLP, Sentence-BERT, BioBERT, ClinicalBERT, adverse event detection, symptom clustering, pharmacovigilance
TL;DR: We combine structured data with narrative text embeddings from VAERS to improve detection of serious vaccine adverse events and reveal clinically meaningful symptom patterns.
Abstract: We present a pipeline that combines structured metadata with sentence-level embeddings from narrative symptom text in VAERS to predict serious vaccine outcomes. Using pretrained language models such as SBERT, BioBERT, and ClinicalBERT, we achieve improved classification accuracy compared to models that rely solely on structured data. SBERT performs best overall, while ClinicalBERT also performs well, demonstrating the value of both general-purpose and domain-specific models. Clustering the symptom embeddings reveals clinically relevant patterns, such as shingles flare-ups and neurological issues, that may not appear in structured fields. Our findings suggest that incorporating narrative text enhances both predictive accuracy and interpretability.
Track: 5. Public Health Informatics
Registration Id: PCNYRRZW7JQ
Submission Number: 82
Loading