PatentEdits: A Patent Dataset Built for Predicting Revisions

ACL ARR 2024 June Submission3287 Authors

15 Jun 2024 (modified: 02 Aug 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Patents are critical protections of a company's intellectual property and competitive advantage, as they grant inventors the exclusive rights to make, use and sell the disclosed invention for 20 years. In order to be granted, a patent must be deemed novel and non-obvious by the US Patent Office (USPTO). To meet this criteria, most patent agents and inventors will revise the language and scope of the claimed invention after official feedback is received. To better understand what revisions lead to successful patents, we present the PatentEdits dataset, which contains 483,706 granted patent examples and is the first to align them before and after revision. We define and extract the following sentence or claim level edit actions in our dataset: a given draft claim is either kept, merged, edited, or deleted. For each patent we also include the USPTO examiner cited references, which can be used in edit action prediction. We also demonstrate the promise of the following model pipeline for predicting the entire granted patent: 1) the prediction of edit actions on the draft claims followed by 2) the prediction of the revised claims with the edit actions.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: Edit Models, Document Retrieval, Long-context language models, efficient attention
Contribution Types: NLP engineering experiment, Data resources, Data analysis
Languages Studied: English
Submission Number: 3287