{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# 试卷名：新课标卷Ⅰ 高考真题 【英语】学科\n",
        "\n",
        "## 题目编号：1\n",
        "## 题目标答\n",
        "C\n",
        "B\n",
        "B\n",
        "## Yi 34B 模型输出\n",
        "The aim of the Habitat Restoration Team is C. To protect the local ecosystem.\n",
        "\n",
        "The lower age limit for joining the Habitat Restoration Team is B. 10.\n",
        "\n",
        "Volunteers are expected to do B. Work even in bad weather.\n",
        "\n",
        "------\n",
        "## 题目编号：2\n",
        "## 题目标答\n",
        "A\n",
        "C\n",
        "D\n",
        "A\n",
        "## Yi 34B 模型输出\n",
        "1. A. He's odd.\n",
        "2. C. He benefited from it as a patient.\n",
        "3. D. The effectiveness of holistic medicine.\n",
        "4. A. To prove Farber's point.\n",
        "\n",
        "------\n",
        "## 题目编号：3\n",
        "## 题目标答\n",
        "D\n",
        "A\n",
        "A\n",
        "C\n",
        "## Yi 34B 模型输出\n",
        "1. 划线短语 \"shine through\" 在第二段中的意思是什么？A. 似乎不太可能持续。B. 似乎很难解释。C. 变得准备使用。D. 变得容易注意到。\n",
        "答案：D. 变得容易注意到。\n",
        "\n",
        "2. Shallowing hypothesis 假设了什么？A. 读者对数字文本轻视。B. 数字文本更容易理解。C. 人们随机选择数字文本。D. 数字文本适合社交媒体。\n",
        "答案：A. 读者对数字文本轻视。\n",
        "\n",
        "3. 为什么大学老师越来越多地使用音频和视频？A. 它们可以吸引学生的注意力。B. 它们准备起来更方便。C. 它们有助于发展先进技能。D. 它们比文本更具信息性。\n",
        "答案：A. 它们可以吸引学生的注意力。\n",
        "\n",
        "4. 作者在最后一段中暗示了什么？A. 学生应该应用多种学习技巧。B. 教师应该自己制作教学材料。C. 印刷文本在教育中不能完全被取代。D. 课堂外的教育不能被忽视。\n",
        "答案：C. 印刷文本在教育中不能完全被取代。\n",
        "\n",
        "------\n",
        "## 题目编号：4\n",
        "## 题目标答\n",
        "B\n",
        "C\n",
        "C\n",
        "D\n",
        "## Yi 34B 模型输出\n",
        "B. They are mostly in electronic form.\n",
        "C. Observational data.\n",
        "C. Improper way of sampling.\n",
        "B. Hire experts to check the records.\n",
        "\n",
        "------\n",
        "## 题目编号：5\n",
        "## 题目标答\n",
        "FBEAD\n",
        "## Yi 34B 模型输出\n",
        "F. That means good grammar and no spelling mistakes.\n",
        "\n",
        "1. The statement \"No editor is likely to tolerate a writer who does not take the trouble to spell words correctly\" emphasizes the importance of correct spelling in a manuscript. Therefore, the correct answer is F, as it explains what is meant by a \"perfect, professionally presented manuscript,\" which includes good grammar and no spelling mistakes.\n",
        "\n",
        "2. The context suggests that the writer is discussing their preference for physical dictionaries over electronic spellcheckers. The statement \"I don't trust my laptop's spellchecker\" leads to the conclusion that the writer believes in the importance of using a dictionary. Answer C, \"But I still don't want to replace them,\" fits this context as it shows the writer's preference for their physical dictionaries despite the availability of electronic alternatives.\n",
        "\n",
        "3. The passage discusses the features a dictionary should have, such as providing precise definitions and differentiating words with similar meanings. Answer E, \"Of course, a dictionary is not only for spelling,\" fits here as it expands on the multifaceted role of a dictionary, beyond just spelling.\n",
        "\n",
        "4. The mention of the Shorter Oxford Dictionary and its extensive content leads to a discussion of its usefulness. Answer A, \"I don't often use this dictionary,\" is the logical continuation, as it contrasts the impressive scope of the dictionary with the writer's actual usage, possibly due to its size.\n",
        "\n",
        "5. The reference to Samuel Johnson's Dictionary and the quote about making dictionaries being dull work, suggests a reflection on the process of using and appreciating dictionaries. Answer D, \"But a dictionary can be a pleasure to read,\" fits this context as it offers a counterpoint to Johnson's comment, suggesting that while compiling dictionaries may be dull, reading them can be enjoyable.\n",
        "\n",
        "G. Dictionaries don't always give you enough information. (This option does not fit the flow of the passage as it introduces a negative aspect that is not discussed in the given text.)\n",
        "B. It takes no account of the context. (This option is not directly related to the discussion of dictionaries and their usage in the passage.)\n",
        "\n",
        "------\n",
        "## 题目编号：6\n",
        "## 题目标答\n",
        "CADCB DCBBA DADBC\n",
        "## Yi 34B 模型输出\n",
        "1. C. won\n",
        "2. A. regularly\n",
        "3. D. training\n",
        "4. B. believed\n",
        "5. B. achievement\n",
        "6. D. reason\n",
        "7. D. dealt with\n",
        "8. B. dreamed\n",
        "9. B. borrowed\n",
        "10. D. challenges\n",
        "11. D. stopped\n",
        "12. C. familiar\n",
        "13. D. appealing\n",
        "14. B. matured\n",
        "15. C. goals\n",
        "\n",
        "------\n",
        "## 题目编号：7\n",
        "## 题目标答\n",
        "engineering\n",
        "functional\n",
        "to give\n",
        "closed\n",
        "walks\n",
        "the\n",
        "favourites\n",
        "as\n",
        "that\n",
        "richness\n",
        "## Yi 34B 模型输出\n",
        "1. engineers'\n",
        "2. functional\n",
        "3. to give\n",
        "4. closed\n",
        "5. walks\n",
        "6. the\n",
        "7. favorites\n",
        "8. as\n",
        "9. that\n",
        "10. richness\n",
        "\n",
        "解析：\n",
        "1. 这里需要使用名词的 possessive 形式，表示“最新的工程师技术”。\n",
        "2. \"function\" 的形容词形式为 \"functional\"，表示“功能性的”。\n",
        "3. 这里的动词应该是“to give”，表示“在温暖的天气里打开”以提供阳光和新鲜空气。\n",
        "4. \"close\" 应该使用形容词形式 \"closed\"，表示结构在寒冷天气下关闭以保护植物。\n",
        "5. \"walk\" 应该使用第三人称单数形式 \"walks\"，表示“引导游客穿越之旅”。\n",
        "6. \"the first time\" 应使用定冠词 \"the\"。\n",
        "7. \"favourite\" 应该使用复数形式 \"favorites\"，表示“现代西方的最爱植物”。\n",
        "8. \"as\" 表示“作为”，说明 Glasshouse 是一个伟大的成就。\n",
        "9. 定语从句 \"that brought the plants from their native habitat in Asia\" 修饰前面的 \"steps along the Silk Route\"。\n",
        "10. \"richness\" 表示“丰富程度”，与“gardening in England”相搭配。\n",
        "\n",
        "------\n",
        "## 题目编号：8\n",
        "## 题目标答\n",
        "略\n",
        "## Yi 34B 模型输出\n",
        "Dear Chris,\n",
        "\n",
        "I'm writing to share with you an art class I had in a park last Friday. It was a unique experience that I'm eager to tell you about.\n",
        "\n",
        "The weather was perfect, and we were all excited to draw our surroundings. I chose to depict a beautiful willow tree swaying in the breeze. It was challenging but fun to capture the movement and the shadows it cast on the grass.\n",
        "\n",
        "I'm not an expert, but I felt a great sense of achievement when I finally added the last touch to my drawing. It was more than just completing a piece of art; it was like capturing a moment in time. Being outdoors, surrounded by nature, really inspired me.\n",
        "\n",
        "I believe this class has opened a new door for me to appreciate the beauty of the world in a different way. I'm thinking of trying watercolors next. Who knows, maybe one day I'll be able to share my art with you!\n",
        "\n",
        "Looking forward to hearing from you soon.\n",
        "\n",
        "Yours\n",
        "Li Hua\n",
        "\n",
        "------\n",
        "## 题目编号：9\n",
        "## 题目标答\n",
        "略\n",
        "## Yi 34B 模型输出\n",
        "I ran back to Gunter and told him the bad news. The expression on his face remained unchanged, still wearing that understanding smile. \"Do not worry, my friend,\" Gunter said in a soothing voice. \"There must be another way.\" He thought for a moment and then suggested that I use my phone to make a digital payment. However, my phone had run out of battery, leaving me with no means to pay.\n",
        "\n",
        "Seeing my mounting despair, Gunter's eyes lit up with a spark of an idea. \"Wait here,\" he said, and disappeared into the bustling bus station. To my astonishment, he returned with the station manager in tow. Gunter explained my situation to the manager, who, after a few moments of consideration, agreed to accept my promise to pay for the taxi fare once I could. The manager's kindness and Gunter's generosity left me speechlessly grateful.\n",
        "\n",
        "Four days later, when I was back in Vienna, I called Gunter as promised. I repaid the taxi fare and thanked him profusely for his help that evening. Gunter simply laughed and said, \"It was nothing. You'll do the same for someone else someday.\" Our conversation ended with a promise to meet for a cup of coffee when I next visited Vienna.\n",
        "\n",
        "That night, as I boarded the bus to Prague, I couldn't help but reflect on the series of events that had unfolded. Gunter's kindness had not only saved me from missing my bus but had also restored my faith in human compassion. It was a cold, wet, and unforgettable evening, but the warmth of Gunter's generosity had melted the chill of the storm.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "27dfe3283b2b427ba117d21f7cf1785b",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "Loading checkpoint shards:   0%|          | 0/15 [00:00<?, ?it/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        }
      ],
      "source": [
        "import re\n",
        "import json\n",
        "\n",
        "import torch\n",
        "from transformers import AutoTokenizer, AutoModelForCausalLM\n",
        "\n",
        "device = torch.device(\"cuda\")\n",
        "\n",
        "model_path = \"01-ai/Yi-1.5-34B-Chat\"\n",
        "gen_kwargs = {\"max_length\": 2048, \"do_sample\": False}\n",
        "\n",
        "tokenizer = AutoTokenizer.from_pretrained(\n",
        "    model_path, trust_remote_code=True,\n",
        ")\n",
        "model = AutoModelForCausalLM.from_pretrained(\n",
        "    model_path,\n",
        "    trust_remote_code=True,\n",
        "    torch_dtype=torch.float16,\n",
        ").eval().to(device)\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "试卷名：新课标卷Ⅰ 高考真题 【英语】学科\n",
            "***********************************\n",
            "题目编号：1\n",
            "题目标答：C\n",
            "B\n",
            "B\n",
            "Yi 34B 模型输出：The aim of the Habitat Restoration Team is C. To protect the local ecosystem.\n",
            "\n",
            "The lower age limit for joining the Habitat Restoration Team is B. 10.\n",
            "\n",
            "Volunteers are expected to do B. Work even in bad weather.\n",
            "***************\n",
            "题目编号：2\n",
            "题目标答：A\n",
            "C\n",
            "D\n",
            "A\n",
            "Yi 34B 模型输出：1. A. He's odd.\n",
            "2. C. He benefited from it as a patient.\n",
            "3. D. The effectiveness of holistic medicine.\n",
            "4. A. To prove Farber's point.\n",
            "***************\n",
            "题目编号：3\n",
            "题目标答：D\n",
            "A\n",
            "A\n",
            "C\n",
            "Yi 34B 模型输出：1. 划线短语 \"shine through\" 在第二段中的意思是什么？A. 似乎不太可能持续。B. 似乎很难解释。C. 变得准备使用。D. 变得容易注意到。\n",
            "答案：D. 变得容易注意到。\n",
            "\n",
            "2. Shallowing hypothesis 假设了什么？A. 读者对数字文本轻视。B. 数字文本更容易理解。C. 人们随机选择数字文本。D. 数字文本适合社交媒体。\n",
            "答案：A. 读者对数字文本轻视。\n",
            "\n",
            "3. 为什么大学老师越来越多地使用音频和视频？A. 它们可以吸引学生的注意力。B. 它们准备起来更方便。C. 它们有助于发展先进技能。D. 它们比文本更具信息性。\n",
            "答案：A. 它们可以吸引学生的注意力。\n",
            "\n",
            "4. 作者在最后一段中暗示了什么？A. 学生应该应用多种学习技巧。B. 教师应该自己制作教学材料。C. 印刷文本在教育中不能完全被取代。D. 课堂外的教育不能被忽视。\n",
            "答案：C. 印刷文本在教育中不能完全被取代。\n",
            "***************\n",
            "题目编号：4\n",
            "题目标答：B\n",
            "C\n",
            "C\n",
            "D\n",
            "Yi 34B 模型输出：B. They are mostly in electronic form.\n",
            "C. Observational data.\n",
            "C. Improper way of sampling.\n",
            "B. Hire experts to check the records.\n",
            "***************\n",
            "题目编号：5\n",
            "题目标答：FBEAD\n",
            "Yi 34B 模型输出：F. That means good grammar and no spelling mistakes.\n",
            "\n",
            "1. The statement \"No editor is likely to tolerate a writer who does not take the trouble to spell words correctly\" emphasizes the importance of correct spelling in a manuscript. Therefore, the correct answer is F, as it explains what is meant by a \"perfect, professionally presented manuscript,\" which includes good grammar and no spelling mistakes.\n",
            "\n",
            "2. The context suggests that the writer is discussing their preference for physical dictionaries over electronic spellcheckers. The statement \"I don't trust my laptop's spellchecker\" leads to the conclusion that the writer believes in the importance of using a dictionary. Answer C, \"But I still don't want to replace them,\" fits this context as it shows the writer's preference for their physical dictionaries despite the availability of electronic alternatives.\n",
            "\n",
            "3. The passage discusses the features a dictionary should have, such as providing precise definitions and differentiating words with similar meanings. Answer E, \"Of course, a dictionary is not only for spelling,\" fits here as it expands on the multifaceted role of a dictionary, beyond just spelling.\n",
            "\n",
            "4. The mention of the Shorter Oxford Dictionary and its extensive content leads to a discussion of its usefulness. Answer A, \"I don't often use this dictionary,\" is the logical continuation, as it contrasts the impressive scope of the dictionary with the writer's actual usage, possibly due to its size.\n",
            "\n",
            "5. The reference to Samuel Johnson's Dictionary and the quote about making dictionaries being dull work, suggests a reflection on the process of using and appreciating dictionaries. Answer D, \"But a dictionary can be a pleasure to read,\" fits this context as it offers a counterpoint to Johnson's comment, suggesting that while compiling dictionaries may be dull, reading them can be enjoyable.\n",
            "\n",
            "G. Dictionaries don't always give you enough information. (This option does not fit the flow of the passage as it introduces a negative aspect that is not discussed in the given text.)\n",
            "B. It takes no account of the context. (This option is not directly related to the discussion of dictionaries and their usage in the passage.)\n",
            "***************\n",
            "题目编号：6\n",
            "题目标答：CADCB DCBBA DADBC\n",
            "Yi 34B 模型输出：1. C. won\n",
            "2. A. regularly\n",
            "3. D. training\n",
            "4. B. believed\n",
            "5. B. achievement\n",
            "6. D. reason\n",
            "7. D. dealt with\n",
            "8. B. dreamed\n",
            "9. B. borrowed\n",
            "10. D. challenges\n",
            "11. D. stopped\n",
            "12. C. familiar\n",
            "13. D. appealing\n",
            "14. B. matured\n",
            "15. C. goals\n",
            "***************\n",
            "题目编号：7\n",
            "题目标答：engineering\n",
            "functional\n",
            "to give\n",
            "closed\n",
            "walks\n",
            "the\n",
            "favourites\n",
            "as\n",
            "that\n",
            "richness\n",
            "Yi 34B 模型输出：1. engineers'\n",
            "2. functional\n",
            "3. to give\n",
            "4. closed\n",
            "5. walks\n",
            "6. the\n",
            "7. favorites\n",
            "8. as\n",
            "9. that\n",
            "10. richness\n",
            "\n",
            "解析：\n",
            "1. 这里需要使用名词的 possessive 形式，表示“最新的工程师技术”。\n",
            "2. \"function\" 的形容词形式为 \"functional\"，表示“功能性的”。\n",
            "3. 这里的动词应该是“to give”，表示“在温暖的天气里打开”以提供阳光和新鲜空气。\n",
            "4. \"close\" 应该使用形容词形式 \"closed\"，表示结构在寒冷天气下关闭以保护植物。\n",
            "5. \"walk\" 应该使用第三人称单数形式 \"walks\"，表示“引导游客穿越之旅”。\n",
            "6. \"the first time\" 应使用定冠词 \"the\"。\n",
            "7. \"favourite\" 应该使用复数形式 \"favorites\"，表示“现代西方的最爱植物”。\n",
            "8. \"as\" 表示“作为”，说明 Glasshouse 是一个伟大的成就。\n",
            "9. 定语从句 \"that brought the plants from their native habitat in Asia\" 修饰前面的 \"steps along the Silk Route\"。\n",
            "10. \"richness\" 表示“丰富程度”，与“gardening in England”相搭配。\n",
            "***************\n",
            "题目编号：8\n",
            "题目标答：略\n",
            "Yi 34B 模型输出：Dear Chris,\n",
            "\n",
            "I'm writing to share with you an art class I had in a park last Friday. It was a unique experience that I'm eager to tell you about.\n",
            "\n",
            "The weather was perfect, and we were all excited to draw our surroundings. I chose to depict a beautiful willow tree swaying in the breeze. It was challenging but fun to capture the movement and the shadows it cast on the grass.\n",
            "\n",
            "I'm not an expert, but I felt a great sense of achievement when I finally added the last touch to my drawing. It was more than just completing a piece of art; it was like capturing a moment in time. Being outdoors, surrounded by nature, really inspired me.\n",
            "\n",
            "I believe this class has opened a new door for me to appreciate the beauty of the world in a different way. I'm thinking of trying watercolors next. Who knows, maybe one day I'll be able to share my art with you!\n",
            "\n",
            "Looking forward to hearing from you soon.\n",
            "\n",
            "Yours\n",
            "Li Hua\n",
            "***************\n",
            "题目编号：9\n",
            "题目标答：略\n",
            "Yi 34B 模型输出：I ran back to Gunter and told him the bad news. The expression on his face remained unchanged, still wearing that understanding smile. \"Do not worry, my friend,\" Gunter said in a soothing voice. \"There must be another way.\" He thought for a moment and then suggested that I use my phone to make a digital payment. However, my phone had run out of battery, leaving me with no means to pay.\n",
            "\n",
            "Seeing my mounting despair, Gunter's eyes lit up with a spark of an idea. \"Wait here,\" he said, and disappeared into the bustling bus station. To my astonishment, he returned with the station manager in tow. Gunter explained my situation to the manager, who, after a few moments of consideration, agreed to accept my promise to pay for the taxi fare once I could. The manager's kindness and Gunter's generosity left me speechlessly grateful.\n",
            "\n",
            "Four days later, when I was back in Vienna, I called Gunter as promised. I repaid the taxi fare and thanked him profusely for his help that evening. Gunter simply laughed and said, \"It was nothing. You'll do the same for someone else someday.\" Our conversation ended with a promise to meet for a cup of coffee when I next visited Vienna.\n",
            "\n",
            "That night, as I boarded the bus to Prague, I couldn't help but reflect on the series of events that had unfolded. Gunter's kindness had not only saved me from missing my bus but had also restored my faith in human compassion. It was a cold, wet, and unforgettable evening, but the warmth of Gunter's generosity had melted the chill of the storm.\n"
          ]
        }
      ],
      "source": [
        "subject, paper_type = \"英语\", \"新课标卷Ⅰ\"\n",
        "file_name = f\"../data/{paper_type}/{subject}.jsonl\"\n",
        "\n",
        "terminators = [\n",
        "    tokenizer.eos_token_id,\n",
        "    tokenizer.convert_tokens_to_ids(\"<|eot_id|>\"),\n",
        "    tokenizer.convert_tokens_to_ids(\"<|im_end|>\")\n",
        "]\n",
        "\n",
        "questions = []\n",
        "\n",
        "print(f\"试卷名：{paper_type} 高考真题 【{subject}】学科\")\n",
        "\n",
        "with open(file_name, \"r\") as f:\n",
        "    for i, line in enumerate(f):\n",
        "        data = json.loads(line)\n",
        "        has_img, question = False, data['prompt']\n",
        "\n",
        "        if '<img' in question:\n",
        "            has_img = True\n",
        "            question = re.sub(r'<img[^>]*?/>', \"\", question)\n",
        "\n",
        "        input_ids = tokenizer.apply_chat_template(\n",
        "            conversation=[{\"role\": \"user\", \"content\": question}], \n",
        "            tokenize=True, \n",
        "            add_generation_prompt=True, \n",
        "            return_tensors='pt'\n",
        "        )\n",
        "        output_ids = model.generate(input_ids.to(device), eos_token_id=terminators, **gen_kwargs)\n",
        "        response = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=True)\n",
        "\n",
        "        if i == 0:\n",
        "            print(\"*\" * 35)\n",
        "        else:\n",
        "            print(\"*\" * 15)\n",
        "\n",
        "        print(\"题目编号：\" + str(i+1) + (\"（含图片）\" if has_img else \"\"))\n",
        "        print(\"题目标答：\" + data[\"answer\"])\n",
        "        print(\"Yi 34B 模型输出：\" + response)\n",
        "            \n",
        "        questions.append({\n",
        "            \"id\": str(i+1),\n",
        "            \"question\": question,\n",
        "            \"answer\": data[\"answer\"],\n",
        "            \"output\": response,\n",
        "            \"has_img\": has_img\n",
        "        })"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "tf_4_7",
      "language": "python",
      "name": "tf_4_7"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.10.14"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 2
}