Enhancing Information Extraction with METORIE: A Metaphor and Trap-Based Dataset for Cross-Domain Fine-Tuning
Abstract: This research proposes the METORIE dataset 1, a novel resource designed to improve the reasoning capabilities of large language models (LLMs), such as LLaMA3 and GLM4, in information extraction (IE) tasks. The METORIE dataset is derived from brain teasers that incorporate complex logical and metaphorical elements and is designed to train LLMs to navigate intricate reasoning paths and interpret layered expressions. Our findings demonstrate that the METORIE dataset markedly enhances LLMs’ performance across both general and specialized IE tasks. The results of fine-tuning with the METORIE dataset, mixed with a small number of IE datasets, are close to, if not exceeding, those of LLMs of the same parametric size on IE tasks using much larger datasets. Through controlled experiments, we establish that metaphors of medium complexity optimize IE performance, while higher complexities tend to overstretch LLMs’ inference limits. METORIE-fine-tuned LLMs also demonstrate exceptional performance in legal and medical domains, suggesting that enhanced metaphor understanding and logical deduction are key to improving LLMs’ adaptability and efficiency in vertical domains.
External IDs:dblp:conf/icassp/PanPJCQMYW025
Loading