Abstract: In this work, we introduce a comprehensive error typology specifically designed for evaluating two distinct tasks in machine-generated patent texts: Claims2Abstract generation and the generation of subsequent claims given previous ones. We have also developed a benchmark, PatentEval, for systematically assessing language models in this context. Our study includes a comparative analysis, annotated by humans, of various models. These range from those specifically adapted for tasks within the patent domain to the latest general-purpose language models. Furthermore, we designed and evaluated metrics to approximate human judgments in patent text evaluation, analyzing the extent to which these metrics align with expert assessments. These approaches provide valuable insights into the capabilities and limitations of current language models in the specialized field of patent text generation.
Paper Type: long
Research Area: Resources and Evaluation
Contribution Types: Model analysis & interpretability, Reproduction study, Data resources, Data analysis
Languages Studied: English
0 Replies
Loading