Abstract: Statutory reasoning is the task of reasoning with facts and statutes,
which are rules written in natural language by a legislature. It is
a basic legal skill. In this paper we explore the capabilities of the
most capable GPT-3 model, text-davinci-003, on an established
statutory-reasoning dataset called SARA. We consider a variety
of approaches, including dynamic few-shot prompting, chain-of-
thought prompting, and zero-shot prompting. While we achieve
results with GPT-3 that are better than the previous best published
results, we also identify several types of clear errors it makes. We
investigate why these errors happen. We discover that GPT-3 has im-
perfect prior knowledge of the actual U.S. statutes on which SARA
is based. More importantly, we create simple synthetic statutes,
which GPT-3 is guaranteed not to have seen during training. We
find GPT-3 performs poorly at answering straightforward questions
about these simple synthetic statutes.
Loading