MariQA: A Large Scale Question Answering Dataset in the Domain of Maritime Affairs

ACL ARR 2025 February Submission7754 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Currently, natural language processing (NLP) is still in the early stages of exploration in one of the world's oldest industries, maritime, and to date, there is no large-scale dataset available. To fill this gap, we construct the first large scale maritime-focused dataset encompassing eight crew positions with approximately 90,000 question-answer pairs to comprehensively evaluate LLMs' domain knowledge and response capabilities. Our experiments on this dataset revealed: mainstream LLMs lack maritime knowledge, where even state-of-the-art models like GPT-4o and Qwen-Max achieved only passing scores, showing the significant room for improvement of current LLMs in the domain of maritime affairs. To promote the development of large language models in the maritime field, we will open-sourcing the proposed dataset.
Paper Type: Short
Research Area: Resources and Evaluation
Research Area Keywords: Question Answering, Maritime Affairs, Large Language Model
Contribution Types: NLP engineering experiment, Data resources
Languages Studied: Chinese, English
Submission Number: 7754
Loading