Abstract: While legal AI has made strides in recent years, it still struggles with basic legal concepts: \textit{when} does a law apply? \textit{Who} it applies to? \textit{What} does it do? We take a \textit{discourse} approach to addressing these problems and introduce a novel taxonomy for span-and-relation parsing of legal texts. We create a dataset, \textit{LegalDiscourse} of $602$ state-level law paragraphs consisting of $3,715$ discourse spans and $1,671$ relations. Our trained annotators have an agreement-rate $\kappa>.8$, yet few-shot GPT3.5 performs poorly at span identification and relation classification. Although fine-tuning improves performance, GPT3.5 still lags far below human level. We demonstrate the usefulness of our schema by creating a web application with journalists. We collect over $100,000$ laws for $52$ U.S. states and territories using $20$ scrapers we built, and apply our trained models to $6,000$ laws using U.S. Census population numbers. We describe two journalistic outputs stemming from this application: (1) an investigation into the increase in liquor licenses following population growth and (2) a decrease in applicable laws under different under-count projections.
Paper Type: long
Research Area: NLP Applications
Contribution Types: Data resources
Languages Studied: English
Consent To Share Submission Details: On behalf of all authors, we agree to the terms above to share our submission details.
0 Replies
Loading