Abstract: Biomedical Terminology Normalization aims at finding the standard term in a given termbase for non-standardized mentions coming from social media or clinical texts, and the mainstream approaches adopted with the ``Recall and Re-rank'' framework. Instead of the traditional pretraining-finetuning paradigm, we would like to explore the possibility of accomplishing this task through a training-free paradigm using the powerful large language models (LLMs). Hoping to address the costs of re-training due to discrepancies of both standard termbases and annotation protocols. Another major obstacle in this task is that both mentions and terms are short texts. Short texts contain an insufficient amount of information that can introduce ambiguity, especially in a biomedical context. Therefore, besides using the advanced embedding model, we distill knowledge from LLM to expand the short text for a more informative description, enabling a superior unsupervised retrieval approach. Furthermore, we introduce an innovative training-free biomedical terminology normalization framework. By leveraging the reasoning capabilities of the LLM, in combination with supervised data and domain-specific expertise, to conduct more sophisticated ranking and re-ranking processes. Experimental results across multiple datasets indicate that both our unsupervised and supervised approaches achieve state-of-the-art.
Paper Type: long
Research Area: NLP Applications
Contribution Types: NLP engineering experiment
Languages Studied: English
0 Replies
Loading