Taxation Perspectives from Large Language Models: A Case Study on Additional Tax Penalties

ACL ARR 2025 May Submission6125 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: How capable are large language models (LLMs) in the domain of taxation? Although numerous studies have explored the legal domain in general, research dedicated to taxation remain scarce. Moreover, the datasets used in these studies are either simplified, failing to reflect the real-world complexities, or unavailable as open source. To address this gap, we introduce PLAT, a new benchmark designed to assess the ability of LLMs to predict the legitimacy of additional tax penalties. PLAT comprises a total of 300 examples, (1) 100 binary-choice questions, (2) 100 multiple-choice questions, and (3) 100 essay type questions, all originally derived from 100 Korean precedents. PLAT is constructed to evaluate not only LLMs' understanding of tax law, but also their performance in legal cases that require complex reasoning beyond straightforward application of statutes. Our systematic experiments with multiple LLMs reveal that (1) their baseline capabilities are limited, especially in casees involving conflicting issues that requires a comprehensive understanding, and (2) LLMs struggles particularly with the "AC" stages of "IRAC" even for advanced reasoning models like o3, which actively employ inference-time scaling.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: legal NLP
Contribution Types: Model analysis & interpretability, Data resources
Languages Studied: Korean
Submission Number: 6125
Loading