Mechanistic Interpretability of Semantic Abstraction in Biomedical Texts

Published: 06 Oct 2025, Last Modified: 06 Oct 2025NeurIPS 2025 2nd Workshop FM4LS PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: mechanistic interpretability, biomedical natural language processing, activation patching, masked language modeling, representation alignment, model ablation, semantic equivalence, plain-language medical communication
TL;DR: We probe biomedical language models on the PLABA dataset to study how they represent semantically equivalent plain and technical medical text for enhanced medical communication.
Abstract: We investigate whether biomedical language models create register-invariant semantic representations of sentences: a cognitive ability that allows consistent and reliable clinical communication across different language styles. Using aligned sentence pairs (technical vs. plain language abstracts that mean the same thing), we analyze how BioBERT, SciBERT, Clinical-T5, and BioGPT react to varying registers through similarity measures, trajectory visualization, and activation patching. Results show models converge to shared semantic states in mid-to-late layers through internal processes that preserve meaning across stylistic variation.
Submission Number: 59
Loading