Abstract: In legal reasoning, part of determining whether evidence should be admissible in court requires assessing its relevance to the case, often formalized as its probative value---the degree to which its being true or false proves a fact in issue. However, determining probative value is an imprecise process and must often rely on consideration of arguments for and against the probative value of a fact. Can generative language models be of use in generating or assessing such arguments? In this work, we introduce relevance chain prompting, a new prompting method that enables large language models to reason about the relevance of evidence to a given fact and uses measures of chain strength. We explore different methods for scoring a relevance chain grounded in the idea of probative value. Additionally, we evaluate the outputs of large language models with ROSCOE metrics and compare the results to chain-of-thought prompting. We test the prompting methods on a dataset created from the Legal Evidence Retrieval dataset. After postprocessing with the ROSCOE metrics, our method outperforms chain-of-thought prompting.
Loading