Abstract: Large language models (LLMs) can perform a wide range of tasks in a zero-shot fashion. Yet, defining the task and communicating it to the model remains a challenge. While prior work focuses on prompting strategies taking the task definition as a given, we explore the novel use of LLMs for arriving at an optimal task definition in the first place. We propose an experimental framework consisting of a prompt manipulation module, reference data and a measurement kit, and use it to study citation text generation -- a popular natural language processing task without clear consensus on the task definition. Our results highlight the importance of both task definition and task instruction for prompting LLMs, and reveal non-trivial relationships between different evaluation metrics used for the citation text generation task. Our human study illustrates the impact of task definition on non-author human-generated output and reveals the discrepancies between automatic and manual NLG evaluation. Our work contributes to the study of citation text generation in NLP and paves the path towards the systematic study of task definitions in the age of LLMs. Our code is publicly available.
Paper Type: long
Research Area: Generation
Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources, Data analysis
Languages Studied: English
Consent To Share Submission Details: On behalf of all authors, we agree to the terms above to share our submission details.
0 Replies
Loading