Beyond Tokens: Fair Evaluation of French Large Language Models for Clinical Named Entity Recognition

Jamil Zaghir, Mina Bjelogrlic, Jean-Philippe Goldman, Adel Bensahla, Yuanyuan Zheng, Christian Lovis

Published: 2024, Last Modified: 15 May 2025MIE 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Named Entity Recognition (NER) models based on Transformers have gained prominence for their impressive performance in various languages and domains. This work delves into the often-overlooked aspect of entity-level metrics and exposes significant discrepancies between token and entity-level evaluations. The study utilizes a corpus of synthetic French oncological reports annotated with entities representing oncological morphologies. Four different French BERT-based models are fine-tuned for token classification, and their performance is rigorously assessed at both token and entity-level. In addition to fine-tuning, we evaluate ChatGPT’s ability to perform NER through prompt engineering techniques. The findings reveal a notable disparity in model effectiveness when transitioning from token to entity-level metrics, highlighting the importance of comprehensive evaluation methodologies in NER tasks. Furthermore, in comparison to BERT, ChatGPT remains limited when it comes to detecting advanced entities in French.