Abstract: We present LL-Instruct: an 8B instruction-tuned model designed to generate content for English Language Proficiency Assessments (ELPA). We leverage domain expertise to write seed instructions based on publicly available practice tests, which are then used by GPT-4 to generate 70K new instructions and explanations. GPT-4 is also used to validate its own generations, ensuring that the generated instructions are in-domain. Human evaluations show that a Llama-3 8B model fine-tuned on this dataset yields outputs comparable to GPT-3.5, with improved capability for generating explanations. A detailed error analysis highlights the strengths of our fine-tuned model, illustrating how it leads to improvements over standard out-of-the-box models on instructions related to English Language Assessments. To our knowledge, LL-Instruct is the first instruction-tuned model designed specifically for ELPA generation.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: assessment, instruction tuning, EdTech, language learning
Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency, Data resources, Data analysis
Languages Studied: English
Submission Number: 466
Loading