Abstract: The rapid development of Large Language Models (LLMs) has significantly advanced the field of natural language processing, including the automated generation of Multiple-Choice Questions (MCQs) from scientific literature. This study introduces a systematic method for creating high-quality MCQs using advanced LLMs. We developed a specialized dataset by extracting data from extensive literature within the domain of materials science, focusing on five critical tasks: common science knowledge Q&A, digital data extraction, detailed understanding, reasoning and interpretation, and safety judgments. Leveraging carefully designed prompts of LLMs, we automated the generation process and conducted validation of MCQs to ensure their relevance and accuracy. The resulting dataset not only demonstrates the potential of LLMs in producing diverse MCQs but also serves as a benchmark for evaluating the problem-solving capabilities of different LLMs in materials science. Our experimental results reveal the strengths and weaknesses of these LLMs, providing valuable insights for future applications in science.
Loading