
def eval_instruction_prompts(module_name):
    if module_name == "hypothesis_composition_eval_decompose":
        prompts = [
            """You are assisting scientists by carefully breaking down a neuroscience-related hypothesis into its core technical components and assigning importance weights to each.

Here, neuroscience encompasses medicine, cognitive science, psychology, computational neuroscience, and related fields.

A **'component'** strictly refers to explicitly described technical aspects directly stated in the hypothesis, specifically:

### Technical Components to INCLUDE:

**Neurobiological Components:**
- **Explicitly named brain regions, neural circuits, or anatomical structures** (e.g., "dorsolateral prefrontal cortex", "hippocampal CA1 pyramidal neurons", "cortico-striatal-thalamic loops")
- **Specific neurotransmitters, receptors, or molecular targets** (e.g., "NMDA receptors", "dopamine D2 receptors", "BDNF-TrkB signaling", "microRNA-132")
- **Clearly defined neural mechanisms or processes** (e.g., "long-term potentiation", "synaptic pruning", "theta-gamma coupling", "neuroinflammation")
- **Specific genes, proteins, or molecular pathways** (e.g., "COMT Val158Met polymorphism", "mTOR signaling", "amyloid-β aggregation")

**Cognitive/Psychological Components:**
- **Specific cognitive processes or functions** (e.g., "working memory updating", "selective attention", "executive control", "episodic memory encoding")
- **Defined psychological constructs with operational definitions** (e.g., "cognitive load measured by dual-task paradigm", "trait anxiety via STAI scores")
- **Specific behavioral paradigms or tasks** (e.g., "n-back task", "Stroop interference", "delay discounting", "fear conditioning")

**Clinical/Medical Components:**
- **Specific disease mechanisms or pathophysiology** (e.g., "tau hyperphosphorylation", "dopaminergic degeneration", "glutamate excitotoxicity")
- **Defined clinical symptoms or signs** (e.g., "bradykinesia", "positive symptoms of schizophrenia", "REM sleep behavior disorder")
- **Specific therapeutic interventions or drugs** (e.g., "selective serotonin reuptake inhibitors", "deep brain stimulation at 130Hz", "cognitive behavioral therapy protocol")

**Methodological/Technical Components:**
- **Specific measurement techniques or biomarkers** (e.g., "fMRI BOLD signal", "EEG alpha power (8-12Hz)", "CSF tau/Aβ42 ratio", "PET amyloid imaging")
- **Explicitly stated computational models or algorithms** (e.g., "drift-diffusion model", "predictive coding framework", "graph theoretical analysis", "deep learning CNN")
- **Clearly defined temporal or spatial parameters** (e.g., "200ms post-stimulus", "theta frequency (4-8Hz)", "1mm³ voxel resolution")

### Components to EXCLUDE:
- **General outcomes without mechanistic detail** (e.g., "improved cognition", "better quality of life")
- **Vague theoretical concepts** (e.g., "brain connectivity" without specifying networks)
- **Future implications or applications**
- **Statistical results or p-values**
- **General benefits or clinical potential without specific mechanisms**

### Weight Assignment Guidelines:
After identifying each component, assign a weight (0.0 to 1.0) based on:
- **Core mechanism (0.7-1.0)**: Central to the hypothesis's main claim
- **Supporting mechanism (0.4-0.6)**: Important but not central
- **Contextual element (0.1-0.3)**: Provides context but not critical

The weights should sum to approximately 1.0.

### Output Format:
Your response MUST follow this EXACT format:

**Components starts:**
1. [Component description]: [weight as decimal, e.g., 0.4]
2. [Component description]: [weight as decimal, e.g., 0.3]
3. [Component description]: [weight as decimal, e.g., 0.3]
...
**Components ends**

Example of CORRECT format:
**Components starts:**
1. Dopamine receptor D2 antagonism mechanism: 0.4
2. Serotonin 5-HT2A receptor modulation: 0.3
3. Prefrontal cortex activation patterns: 0.3
**Components ends**

CRITICAL: 
- Use the EXACT markers "**Components starts:**" and "**Components ends**"
- Each line must be numbered sequentially (1, 2, 3, ...)
- Format each line as: number. component: weight
- Weights MUST be decimal numbers between 0.0 and 1.0
- CORRECT weight formats: 0.8, 0.25, 1.0, 0.33
- WRONG weight formats: 80%, 1/3, 0.8-0.9, high, ~0.5, approximately 0.3
- DO NOT use percentages, fractions, ranges, or words - ONLY decimal numbers!

### Context:
Research Question: """,
            "\n\nBackground Survey: ",
            "\n\nHypothesis to decompose: ",
            "\n\nNow, carefully identify the technical components and assign weights based on their importance to the core claim of the hypothesis.\n\nREMEMBER: Your response MUST use the exact format with **Components starts:** and **Components ends** markers."
        ]
    elif module_name == "hypothesis_composition_eval_decompose_refine":
        prompts = [
            """You are reviewing a decomposition of a neuroscience-related hypothesis into technical components. A student has attempted to break down the hypothesis, but may have made errors.

Common mistakes include:
1. **Including non-technical elements**: General outcomes, benefits, clinical applications, or vague descriptions
2. **Missing key technical components**: Overlooking specific mechanisms, molecules, cognitive processes, or parameters explicitly stated
3. **Incorrect weight assignment**: Not properly reflecting the importance of each component to the core hypothesis
4. **Merging distinct components**: Combining multiple technical elements that should be listed separately

### Technical Components that SHOULD be included:
**Neurobiological:** Brain regions, neural circuits, neurotransmitters, receptors, genes, proteins, molecular pathways
**Cognitive/Psychological:** Specific cognitive processes, defined psychological constructs, behavioral paradigms
**Clinical/Medical:** Disease mechanisms, defined symptoms, specific interventions
**Methodological:** Measurement techniques, computational models, temporal/spatial parameters

### Components that should be EXCLUDED:
- **General outcomes without mechanistic detail**
- **Vague concepts without specificity**
- **Future implications or potential applications**
- **Statistical results or significance levels**
- **General benefits without technical content**

### Weight Guidelines:
- **Core mechanism (0.7-1.0)**: Central to the hypothesis's main claim
- **Supporting mechanism (0.4-0.6)**: Important but not central
- **Contextual element (0.1-0.3)**: Provides context but not critical
- Weights should sum to approximately 1.0

Please review and refine the student's decomposition by:
1. Removing any non-technical components
2. Adding any missing technical components from the hypothesis
3. Adjusting weights to properly reflect importance
4. Ensuring all components are explicitly stated in the hypothesis (not inferred)

### Output Format:
Your response MUST follow this EXACT format:

**Components starts:**
1. [Component description]: [weight as decimal, e.g., 0.4]
2. [Component description]: [weight as decimal, e.g., 0.3]
3. [Component description]: [weight as decimal, e.g., 0.3]
...
**Components ends**

Example of CORRECT format:
**Components starts:**
1. Dopamine receptor D2 antagonism mechanism: 0.4
2. Serotonin 5-HT2A receptor modulation: 0.3
3. Prefrontal cortex activation patterns: 0.3
**Components ends**

CRITICAL: 
- Use the EXACT markers "**Components starts:**" and "**Components ends**"
- Each line must be numbered sequentially (1, 2, 3, ...)
- Format each line as: number. component: weight
- Weights MUST be decimal numbers between 0.0 and 1.0
- CORRECT weight formats: 0.8, 0.25, 1.0, 0.33
- WRONG weight formats: 80%, 1/3, 0.8-0.9, high, ~0.5, approximately 0.3
- DO NOT use percentages, fractions, ranges, or words - ONLY decimal numbers!

### Context:
Research Question: """,
            "\n\nBackground Survey: ",
            "\n\nOriginal Hypothesis: ",
            "\n\nStudent's decomposition attempt:\n",
            "\n\nNow, please review and provide a corrected decomposition following the guidelines above.\n\nREMEMBER: Your response MUST use the exact format with **Components starts:** and **Components ends** markers."
        ]
    elif module_name == "hypothesis_composition_eval_compare_all":
        # Single prompt to evaluate all components at once
        prompts = [
            """Please compare components from the groundtruth hypothesis with the generated hypothesis to determine how well each groundtruth component is covered.

Evaluate each groundtruth component and assign a coverage score based on these strict criteria:

### Coverage Levels for Neuroscience Components:

**Level 3 (Score: 1.0) - Exact Match:**
- **Neurobiological:** Nearly identical structures/molecules with same specificity (e.g., "NMDA NR2B" matches "NR2B subunit of NMDA receptor")
- **Cognitive:** Same cognitive process with identical parameters (e.g., "2-back working memory" matches "n-back task with n=2")
- **Clinical:** Identical intervention/symptom with same details (e.g., "L-DOPA 100mg" matches "levodopa 100mg")
- **Methodological:** Same technique with identical parameters (e.g., "3T fMRI with 2mm voxels" matches exactly)
- **Numerical parameters:** Difference less than 10% in values

**Level 2 (Score: 0.7) - Highly Related:**
- **Neurobiological:** Same system but different subtype/region (e.g., "D1 receptor" vs "D2 receptor", both dopaminergic)
- **Cognitive:** Same cognitive domain with variation (e.g., "spatial working memory" vs "verbal working memory")
- **Clinical:** Same class of intervention/symptom (e.g., "SSRI fluoxetine" vs "SSRI sertraline")
- **Methodological:** Same technique, different parameters (e.g., "fMRI at 3T" vs "fMRI at 7T")
- **Numerical parameters:** Difference less than 20% in values

**Level 1 (Score: 0.4) - Moderately Related:**
- **Neurobiological:** Related but distinct systems (e.g., "dopamine" vs "serotonin", both monoamines)
- **Cognitive:** Related but different processes (e.g., "attention" vs "executive control")
- **Clinical:** Related interventions/symptoms (e.g., "pharmacotherapy" vs "psychotherapy" for same disorder)
- **Methodological:** Conceptually similar techniques (e.g., "EEG" vs "MEG", both measure neural activity)
- **Numerical parameters:** Difference less than 30% in values

**Level 0 (Score: 0.0) - Not Related:**
- Components from different categories or unrelated systems
- Lacks required specificity (vague when specific detail was given)
- Parameter differences exceed 30%
- Wrong level of analysis (e.g., cellular when systems-level was specified)

### Important Guidelines:
- Evaluate each groundtruth component exactly as provided - do NOT combine or split components
- A component can only achieve high scores if it maintains the same level of technical specificity
- Ignore non-technical content

### Context:
Research Question: """,
            "\n\nBackground Survey: ",
            "\n\nGroundtruth components to evaluate:\n",
            "\n\nGenerated hypothesis:\n",
            """\n\n### Task:
For each numbered groundtruth component above, determine how well it is covered by the generated hypothesis.

CRITICAL: You MUST provide EXACTLY one score for each component listed above. 
- If there are 5 components, provide exactly 5 scores
- If there are 10 components, provide exactly 10 scores
- Do not skip any components
- Do not add extra scores

### Output Format:
Your response MUST follow this EXACT format:

**Scores starts:**
1: [decimal score: 0.0, 0.4, 0.7, or 1.0]
2: [decimal score: 0.0, 0.4, 0.7, or 1.0]
3: [decimal score: 0.0, 0.4, 0.7, or 1.0]
...
**Scores ends**

Example of CORRECT format (if there are 3 components):
**Scores starts:**
1: 0.7
2: 0.4
3: 1.0
**Scores ends**

WRONG format (missing score for component 3):
**Scores starts:**
1: 0.7
2: 0.4
**Scores ends**

CRITICAL INSTRUCTIONS:
- Use the EXACT markers "**Scores starts:**" and "**Scores ends**"
- Provide ONLY decimal number scores (0.0, 0.4, 0.7, or 1.0)
- Scores MUST be decimal numbers like 0.0, 0.4, 0.7, 1.0
- DO NOT use: fractions (3/4), percentages (70%), ranges (0.6-0.8), or words (high/medium)
- Each line format: number: score
- Include ALL components in order (1, 2, 3, ...)
- The NUMBER OF SCORES must EXACTLY MATCH the number of components
- Do NOT skip any component numbers
- Do NOT add extra scores beyond the number of components
- Do NOT include any explanations or text outside the markers

FINAL REMINDER: Use ONLY the format shown above with the exact markers."""
        ]
    elif module_name == "hypothesis_composition_eval_compare_all_refine":
        # Refinement step for comparison
        prompts = [
            """A student has attempted to evaluate how well a generated hypothesis covers the groundtruth components. Please review and correct their evaluation.

### Coverage Levels (Reminder):
- **Level 3 (1.0)**: Exact match - nearly identical with same specificity
- **Level 2 (0.7)**: Highly related - same system/domain with minor variations  
- **Level 1 (0.4)**: Moderately related - conceptually similar but technically different
- **Level 0 (0.0)**: Not related or >30% parameter difference

Common evaluation errors to check:
1. **Over-scoring vague matches**: Giving high scores when specificity is lost (e.g., "brain activity" for "hippocampal theta oscillations")
2. **Under-scoring valid matches**: Missing legitimate coverage due to terminology differences (e.g., "L-DOPA" vs "levodopa")
3. **Ignoring parameter differences**: Not accounting for numerical differences (frequencies, doses, timings)
4. **Category confusion**: Matching components from different levels of analysis (molecular vs systems, cognitive vs neural)
5. **Missing cross-domain relationships**: Not recognizing when cognitive and neural components describe the same phenomenon

### Context:
Research Question: """,
            "\n\nBackground Survey: ",
            "\n\nGroundtruth components:\n",
            "\n\nGenerated hypothesis:\n",
            "\n\nStudent's evaluation:\n",
            """\n\nPlease review and provide corrected scores.

### Output Format:
Your response MUST follow this EXACT format:

**Scores starts:**
1: [decimal score: 0.0, 0.4, 0.7, or 1.0]
2: [decimal score: 0.0, 0.4, 0.7, or 1.0]
3: [decimal score: 0.0, 0.4, 0.7, or 1.0]
...
**Scores ends**

Example of CORRECT format (if there are 3 components):
**Scores starts:**
1: 0.7
2: 0.4
3: 1.0
**Scores ends**

WRONG format (missing score for component 3):
**Scores starts:**
1: 0.7
2: 0.4
**Scores ends**

CRITICAL INSTRUCTIONS:
- Use the EXACT markers "**Scores starts:**" and "**Scores ends**"
- Provide ONLY corrected numerical scores (0.0, 0.4, 0.7, or 1.0)
- Each line format: number: score
- Include ALL components in order (1, 2, 3, ...)
- Do NOT include any explanations or text outside the markers

FINAL REMINDER: Use ONLY the format shown above with the exact markers."""
        ]
    elif module_name == "hypothesis_extraction":
        # Prompt to copy the full hypothesis while removing background/inspiration sections
        prompts = [
            """You are an assistant that copies the hypothesis section from a model-generated response.

The model was given:
- A research question
- Background survey information  
- Previous hypothesis
- An inspiration paper (title and abstract)

The model's response may contain:
1. **Repeated input sections** (research question restatement, background summary, inspiration paper analysis)
2. **Hypothesis sections** (the actual hypothesis proposed by the model)

Your task: Copy the FULL hypothesis section(s) EXACTLY as written, while skipping/removing any sections that repeat or summarize the input.

### Sections to SKIP (do NOT copy):
- "Research Context" / "Research Question" / "Problem Statement" sections
- "Background" / "Previous Findings" / "Literature Review" sections  
- "Inspiration" / "Inspiration Paper" / "Related Literature" sections
- "Previous Hypothesis" sections
- Any section that restates the research question, background, or inspiration abstract

### Sections to COPY IN FULL (hypothesis content):
- "Hypothesis" / "Proposed Hypothesis" / "Novel Hypothesis" sections
- "Integration Strategy" / "Integrated Model" sections
- "Mechanism" / "Proposed Mechanism" sections
- "Methodology" / "Experimental Approach" sections
- Any novel scientific claims not present in the input

IMPORTANT: Copy the hypothesis sections EXACTLY and COMPLETELY. Do NOT summarize, paraphrase, or shorten. Just skip the background/inspiration sections and copy everything else verbatim.

### Original Research Question (for reference, to help identify what to skip):
""",
            """

### Model's Generated Response:
""",
            """

### Task:
Copy the full hypothesis content from the response above, skipping any sections that repeat or summarize the input information (research question, background, inspiration).

### Output Format:
Your response MUST follow this EXACT format:

**Extracted Hypothesis starts:**
[Copy the full hypothesis sections here EXACTLY as written - do NOT summarize]
**Extracted Hypothesis ends**

CRITICAL INSTRUCTIONS:
- Use the EXACT markers "**Extracted Hypothesis starts:**" and "**Extracted Hypothesis ends**"
- SKIP sections that restate the research question, background, or inspiration
- COPY the hypothesis and methodology sections EXACTLY and COMPLETELY
- Do NOT summarize or paraphrase - copy verbatim
- If the entire response is hypothesis (no repeated sections), copy it all
- Maintain the original formatting, structure, and all details"""
        ]
    else:
        raise NotImplementedError("Module name not found: {}".format(module_name))
    
    return prompts