Is ChatGPT a Smart Data Generation Tool? Exploring ChatGPT for Generating Metaphorical Data

ACL ARR 2024 April Submission107 Authors

13 Apr 2024 (modified: 25 May 2024)ACL ARR 2024 April SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Data annotation is a time-consuming and labor-intensive task, with an average annotation cost of \$0.11 per instance on crowdsourcing platforms. This high cost has become a constraint for further development of many researches. As large-scale language models (LLMs) have made significant progress in many tasks, researchers have begun to experiment with the use of prompt learning to generate samples. However, previous studies have mainly focused on surface semantic tasks and neglected in-depth studies of implicit semantic tasks (e.g., metaphors), which require LLMs to provide a deeper understanding of the implicit meanings in text. Therefore, the aim of this paper is to explore the data generation capabilities of ChatGPT in dealing with metaphorical tasks. In previous surface semantic tasks, researchers usually use direct generation of samples (DG) and example-based prompt enhancement (EPE) methods. We propose a sematic-based prompt enhancement (SPE) method. Experiments demonstrate that the SPE method has the best F1 performance on three datasets and exceeds the accuracy of crowdsourced annotations (CA) samples on two datasets. Finally, we provide an in-depth analysis and discussion of the three ChatGPT sample generation methods through extensive example analysis and experiments.
Paper Type: Long
Research Area: Efficient/Low-Resource Methods for NLP
Research Area Keywords: Efficient/Low-Resource Methods for NLP,Interpretability and Analysis of Models for NLP,Language Modeling
Contribution Types: Approaches to low-resource settings, Approaches low compute settings-efficiency
Languages Studied: English
Submission Number: 107
Loading