Benchmarking Large-Language Models for Resource-Efficient Medical AI for Edge Deployment

Published: 01 Jan 2025, Last Modified: 12 Nov 2025AAAI Spring Symposia 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Large-Language Models (LLMs) are rapidly emerging as transformative tools across diverse domains, leveraging extensive training on vast and heterogeneous datasets to capture nuanced knowledge and transcend traditional boundaries of understanding. In the medical domain, LLMs hold immense potential to revolutionize clinical workflows by enhancing the efficiency of medical practitioners and alleviating their workload. However, a critical gap exists between the theoretical capabilities of LLMs and their practical deployment in resource-constrained environments, such as edge devices (e.g., health monitors) commonly used in healthcare settings. This paper addresses this challenge by employing parameter-efficient fine-tuning (PEFT) techniques to adapt widely available advanced LLMs for medical applications while comparing their resource efficiency and performance. The models are fine-tuned on structured medical question answering datasets, and their outputs are evaluated using BERTScore and USEScore metrics. Among the models tested, Mistral v0.3 demonstrated the best performance based on both metrics, while also exhibiting promise for resource efficiency. These findings provide a vital foundation for selecting and optimizing LLMs for healthcare tasks, offering actionable insights for developing resource-efficient and scalable solutions that are well-suited for deployment on edge devices in real-world medical environments.
Loading