Supervised Relation Extraction is More Efficient When Approached as Graph-Based Dependency Parsing

ACL ARR 2025 July Submission558 Authors

28 Jul 2025 (modified: 22 Aug 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large language models (LLMs) have emerged as a convenient tool for the relation extraction (RE) task, both in supervised and in-context learning settings. However, their supervised performance still lags behind much smaller architectures, which we argue is because of two main reasons. (i) For LLMs, both input and labels live in the same prompt space, which makes it necessary for both to be expanded into natural language, decreasing information density. (ii) An LLM has to generate from scratch the entities, entity labels, and relation labels by classifying over the entire vocabulary, while also formatting the output so that predictions can be automatically extracted from the generated output. To show this, we evaluate LLMs and graph-based parsers on six RE datasets with sentence graphs of varying sizes and complexities. Our results show that LLM performance increasingly degrades, compared to graph-based parsers, as the number of relations in documents increases, arguably making the latter a superior choice in the presence of complex annotated data.
Paper Type: Short
Research Area: Efficient/Low-Resource Methods for NLP
Research Area Keywords: relation extraction, biaffine attention, large language model, graph-based parser
Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency
Languages Studied: English
Reassignment Request Area Chair: This is not a resubmission
Reassignment Request Reviewers: This is not a resubmission
A1 Limitations Section: This paper has a limitations section.
A2 Potential Risks: N/A
B Use Or Create Scientific Artifacts: Yes
B1 Cite Creators Of Artifacts: Yes
B1 Elaboration: We cite every dataset (Section 3) and model (Section 4).
B2 Discuss The License For Artifacts: Yes
B2 Elaboration: We cite every dataset (Section 3) and model (Section 4).
B3 Artifact Use Consistent With Intended Use: Yes
B3 Elaboration: We respect all the licenses of every dataset (Section 3) and model (Section 4).
B4 Data Contains Personally Identifying Info Or Offensive Content: N/A
B4 Elaboration: No personal data or offensive content is found in the used data (Section 3).
B5 Documentation Of Artifacts: Yes
B5 Elaboration: We discuss data domains in Section 3.
B6 Statistics For Data: Yes
B6 Elaboration: We provide short statistics for the used datasets in Section 3 and in Appendix A.
C Computational Experiments: Yes
C1 Model Size And Budget: Yes
C1 Elaboration: We report parameters in Section 4 and compute requirements in Section 7.
C2 Experimental Setup And Hyperparameters: Yes
C2 Elaboration: We report hyperparameters in section 4.
C3 Descriptive Statistics: Yes
C3 Elaboration: We report means and standard deviations for F1 scores in Section 5 over multiple seeds.
C4 Parameters For Packages: N/A
C4 Elaboration: We use the default PyTorch and Transformers libraries.
D Human Subjects Including Annotators: No
D1 Instructions Given To Participants: N/A
D2 Recruitment And Payment: N/A
D3 Data Consent: N/A
D4 Ethics Review Board Approval: N/A
D5 Characteristics Of Annotators: N/A
E Ai Assistants In Research Or Writing: No
E1 Information About Use Of Ai Assistants: N/A
Author Submission Checklist: yes
Submission Number: 558
Loading