Challenges in the Evaluation of the Causal Event Extraction TaskDownload PDF

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone
TL;DR: We propose methods to evaluate causal event extraction that go beyond Exact Match
Abstract: Evaluating the causal event extraction task is challenging because the boundaries of the cause and effect clauses can be ambiguous. We find that traditional metrics like Exact Match and BertScore are not representative of model performance, so we trained models, GPT-3.5 and GPT-4 for evaluation. Contrary to previous findings, GPT-4 is not a suitable replacement for human evaluation. Our trained evaluators are better at identifying ambiguous but valid cases but tend to misclassify invalid extractions. We also propose a Reinforcement Learning (RL) framework to improve the model's capacity to capture the semantic meaning rather than replicating the provided annotations. Our RL framework outperforms the other approaches in terms of causal relation classification but still falls short of the supervised fine-tuned model for causal event extraction. Still, our exploration sheds light on the complex nature of the causal event extraction.
Paper Type: short
Research Area: Resources and Evaluation
Contribution Types: Publicly available software and/or pre-trained models, Data analysis
Languages Studied: English
0 Replies

Loading