[
    {
        "id": 1,
        "question": "In specific domains, such as healthcare, using large language models (LLMs) in combination with Retrieval-Augmented Generation (RAG) can effectively reduce hallucinations, while attribution can provide valid citation evidence for the generated answers, making it easier for subsequent evaluation and validation. A method was attempted where GPT-4 was used to generate data, followed by fine-tuning the LLM using supervised fine-tuning (SFT) to directly produce answers and attributions. It was observed that for simple questions (single citation), the model performs well, but for more complex questions, the model's performance declines. After investigating, it was found that the generated dataset primarily contained simple questions, and the citation accuracy of GPT-4 itself is low (around 75%). How can high-quality data be generated to improve performance on complex questions?",
        "response": "# Introduction\n\nBuilding large, high-quality question-answering (QA) datasets with complex questions and accurate citations is crucial for training retrieval-augmented generation (RAG) models in specialized domains like healthcare. Many existing biomedical QA datasets focus on simple, single-hop questions (e.g. PubMedQA’s yes/no questions) ([Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions](https: //arxiv.org/html/2408.00727v1#:~:text=Question answering tasks such as,on MedQA with prompt engineering))."
    },
    {
        "id": 2,
        "question": "What are the potential directions and opportunities for improving the inference capabilities of large models in the presence of DeepSeek R1? Will RL-based methods become the mainstream approach? Can the reward model combined with tree search for Chain-of-Thought (CoT) fine-tuning be discarded? Given the existence of DeepSeek R1, how much potential remains for further research and improvement in large model reasoning capabilities? Will reinforcement learning (RL)-based methods become the dominant approach? Can post-training for chain-of-thought (CoT) reasoning using reward models and tree search be entirely abandoned?",
        "response": "# Introduction\n\nLarge Language Models (LLMs) have recently made significant strides in complex reasoning tasks, aided by techniques like Chain-of-Thought (CoT) prompting and specialized fine-tuning. However, challenges remain: even advanced models can make logical missteps or hallucinate facts during multi-step reasoning ([[2305.20050\\] Let’s Verify Step by Step](https://ar5iv.org/abs/2305.20050#:~:text=In recent years%2C large language,of problems))."
    }
]