What Do LLMs Understand About International Trade? Introducing TradeGov Dataset for International Trade Q&A Evaluation
Keywords: LLM, Trade, Q&A, ChatGPT
TL;DR: This paper introduces TradeGov, a novel dataset for evaluating large language models (LLMs) on international trade law questions and assesses ChatGPT's performance on it.
Abstract: Given the constant flux in the world of geopolitics, staying up to date and compliant with international trade issues is challenging. But exploring if LLMs can aid this task is a frontier hither to unexplored in the LLM evaluation literature - primarily due to the lack of a dataset set for benchmarking the capabilities of LLMs on questions regarding international trade subjects. To address this gap, we introduce TradeGov - a novel, human audited dataset containing 5k international trade related question-answer pairs across 138 countries, created using ChatGPT based on the Country Commercial Guides on the International Trade Administration website. The dataset achieves 98% relevance and faithfulness and doesn't show any systematic biases along macroeconomic and geographical dimensions, lending itself to equal applicably for LLM assessment across countries. Testing the performance of ChatGPT 4o on this dataset - marking the first systematic evaluation of LLMs for answering questions about international trade - we find that it achieves ~84% accuracy. However, we also show that ChatGPT 4o has bias, it performs better for countries with greater ease of business, higher GDP and higher trade shares. The TradeGov dataset thus fills a critical gap in the LLM evaluation literature and paves the way for greater understanding of how LLMs can assist in navigating the complex international trade landscape.
Submission Number: 28
Loading