DisasterQA: A Benchmark for Assessing the performance of LLMs in Disaster Response

Published: 14 Oct 2024, Last Modified: 23 Nov 2024HRAIM PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Humanitarian Assistance and Disaster Relief, Large Language Models, Emergency Response, Disaster Management, Benchmark
TL;DR: This paper introduces DisasterQA, the first open source benchmark for evaluating the disaster response capabilities of Large Language Models.
Abstract: The military plays a key role in Humanitarian Assistance and Disaster Relief. Disasters can result in the deaths of many, making quick response times vital. Large Language Models (LLMs) have emerged as valuable in the field. LLMs can be used to process vast amounts of textual information quickly providing situational context during a disaster. However, the question remains whether LLMs should be used for advice and decision-making in a disaster. To evaluate the capabilities of LLMs in disaster response knowledge, we introduce a benchmark: DisasterQA created from six online sources. The benchmark covers a wide range of disaster response topics. We evaluated five LLMs each with four different prompting methods on our benchmark, measuring both accuracy and confidence levels. We hope that this benchmark pushes forth further development of LLMs in disaster response, ultimately enabling these models to work alongside emergency managers in disasters.
Submission Number: 10
Loading