A Step Forward for Medical LLMs in Brazilian Portuguese: Establishing a Benchmark and a Strong Baseline

Published: 01 Jan 2025, Last Modified: 07 Nov 2025CBMS 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The application of large language models in health-care presents unique challenges, particularly in non-English contexts where linguistic and cultural nuances significantly impact model effectiveness. In this work, we introduce a novel benchmark for evaluating medical language models in Brazilian Portuguese, addressing a critical gap in AI assessment for healthcare applications. This benchmark is built upon Brazilian medical aptitude tests spanning 2011–2024, enabling extensive evaluation of both specialist and general large language models. Our findings demonstrate that despite advancements in language model capabilities, significant gaps remain in their ability to reason effectively about medical knowledge in Brazilian Portuguese. This benchmark establishes a proper foundation for evaluating and advancing medical language models in Portuguese, creating a standardized framework to guide development toward more effective, equitable, and culturally appropriate AI systems for healthcare in Brazil.
Loading