# -*- coding: utf-8 -*-
FEW_SHOT_GEN = [
    # Example 1
    {'role': 'user', 'content': (
        "Scenario: Alice is training for a marathon.\n"
        "Statement: Running on a treadmill improves her endurance.\n"
        "Opposite Statement: Running on a treadmill does not improve her endurance.\n"
        "Generate 2 sentences.")},
    {'role': 'assistant', 'content': (
        "1. Treadmill training allows Alice to maintain a consistent pace and monitor heart rate, boosting her aerobic capacity.\n"
        "2. The treadmill's adjustable incline simulates hill workouts, increasing leg strength and stamina.")},
    # Example 2
    {'role': 'user', 'content': (
        "Scenario: Bob studies every evening.\n"
        "Statement: Studying in short, focused bursts enhances retention.\n"
        "Opposite Statement: Studying in short, focused bursts does not enhance retention.\n"
        "Generate 2 sentences.")},
    {'role': 'assistant', 'content': (
        "1. Brief study sessions with breaks prevent mental fatigue and improve long-term recall.\n"
        "2. Frequent reviews in small intervals reinforce memory pathways, aiding retention.")}
]


# Few-shot examples for factor extraction
FEW_SHOT_EXT = [
    {'role': 'user', 'content': (
        "Extract distinct factors from these sentences as a JSON array.\n"
        "1. Treadmill training allows maintaining a consistent pace and monitoring heart rate, boosting aerobic capacity.\n"
        "2. The treadmill's adjustable incline simulates hill workouts, increasing leg strength and stamina.")},
    {'role': 'assistant', 'content': (
        "Let me analyze these sentences to identify the key factors:\n\n"
        "From sentence 1: The treadmill allows consistent pace (pace consistency) and heart rate monitoring, which boosts aerobic capacity.\n"
        "From sentence 2: The adjustable incline feature simulates hills, which increases leg strength.\n\n"
        "Key factors identified:\n"
        "- Pace consistency (ability to maintain steady speed)\n"
        "- Heart rate monitoring (tracking cardiovascular response)\n"
        "- Adjustable incline (variable difficulty/terrain simulation)\n"
        "- Leg strength (muscle development)\n\n"
        "Final answer:\n"
        '["Pace consistency","Heart rate monitoring","Adjustable incline","Leg strength"]')},
    
    {'role': 'user', 'content': (
        "Extract distinct factors from these sentences as a JSON array.\n"
        "1. Group work requires coordination between team members.\n"
        "2. Individual tasks allow for personal control and efficiency.")},
    {'role': 'assistant', 'content': (
        "Let me identify the key factors from these sentences:\n\n"
        "From sentence 1: Group work needs coordination between team members.\n"
        "From sentence 2: Individual tasks provide personal control and efficiency.\n\n"
        "Key factors:\n"
        "- Coordination requirements (need for team synchronization)\n"
        "- Personal control (individual autonomy)\n"
        "- Efficiency (productivity/effectiveness)\n"
        "- Team collaboration (working together)\n\n"
        "Final answer:\n"
        '["Coordination requirements","Personal control","Efficiency","Team collaboration"]')}
]

# Few-shot for voting - Modified to include Both option with clearer format
DETAILED_FEW_SHOT_VOTE = [
    {'role': 'user', 'content': (
        "Scenario: Alice is training for a marathon.\n"
        "Statement1: Running on a treadmill improves her endurance.\n"
        "Statement2: Running on a treadmill does not improve her endurance.\n"
        "Factor: Pace consistency\n"
        "Decide which statement this factor supports: Statement1, Statement2, or Both. "
        "Think step by step, then provide your final answer as a JSON object.")},
    {'role': 'assistant', 'content': (
        "Let me think about how 'Pace consistency' relates to treadmill training and endurance:\n\n"
        "Pace consistency means being able to maintain a steady running speed. On a treadmill:\n"
        "- The belt moves at a constant speed, forcing the runner to maintain that pace\n"
        "- This helps develop the ability to sustain a steady effort level\n"
        "- Consistent pacing is crucial for endurance training and marathon performance\n"
        "- It helps build aerobic capacity and teaches proper energy management\n\n"
        "This factor clearly supports the idea that treadmill training improves endurance, as pace consistency is a key component of endurance development.\n\n"
        "Final answer:\n"
        '{"Pace consistency": "Statement1"}')},
    
    {'role': 'user', 'content': (
        "Scenario: Alice is training for a marathon.\n"
        "Statement1: Running on a treadmill improves her endurance.\n"
        "Statement2: Running on a treadmill does not improve her endurance.\n"
        "Factor: Weather conditions\n"
        "Decide which statement this factor supports: Statement1, Statement2, or Both. "
        "Think step by step, then provide your final answer as a JSON object.")},
    {'role': 'assistant', 'content': (
        "Let me analyze how 'Weather conditions' relates to treadmill vs outdoor running:\n\n"
        "Weather conditions can affect training in different ways:\n"
        "- Treadmill training: Provides consistent, controlled environment regardless of weather\n"
        "- Outdoor training: Subject to weather variations (heat, cold, rain, wind)\n\n"
        "This factor could support both statements:\n"
        "- For Statement1: Consistent weather conditions on treadmill allow for uninterrupted, regular training\n"
        "- For Statement2: Lack of weather variation might not prepare runners for real race conditions\n\n"
        "Since weather conditions can be used to argue for both the benefits and limitations of treadmill training, this factor relates to both statements.\n\n"
        "Final answer:\n"
        '{"Weather conditions": "Both"}')}
]

FEW_SHOT_VOTE = [
    {
        'role': 'user',
        'content': (
            "Scenario: Alice trains for a marathon.\n"
            "Statement1: Treadmill running improves endurance.\n"
            "Statement2: Treadmill running does not improve endurance.\n"
            "Factor: Pace consistency\n"
            "Decide which statement this factor supports: Statement1, Statement2, or Both. "
            "Think step by step, then answer in JSON."
        )
    },
    {
        'role': 'assistant',
        'content': (
            "Pace consistency forces a steady speed, building aerobic capacity and sustained effort.\n"
            "Final answer:\n"
            '{"Pace consistency": "Statement1"}'
        )
    },
    {
        'role': 'user',
        'content': (
            "Scenario: Alice trains for a marathon.\n"
            "Statement1: Treadmill running improves endurance.\n"
            "Statement2: Treadmill running does not improve endurance.\n"
            "Factor: Weather conditions\n"
            "Decide which statement this factor supports: Statement1, Statement2, or Both. "
            "Think step by step, then answer in JSON."
        )
    },
    {
        'role': 'assistant',
        'content': (
            "Treadmill gives consistent conditions, yet outdoor weather readies race adaptability.\n"
            "Final answer:\n"
            '{"Weather conditions": "Both"}'
        )
    }
]



# Few-shot for clustering
FEW_SHOT_CLUSTER = [
    {'role': 'user', 'content': (
        "Cluster these factors into thematic groups. Think about how they relate to each other, "
        "then provide a JSON object mapping cluster names to factor lists. Use English cluster names.")},
    {'role': 'assistant', 'content': (
        "Let me analyze these factors and group them by theme:\n\n"
        "Looking at the factors:\n"
        "- Pace consistency: Related to training control and mechanics\n"
        "- Heart rate monitoring: Also related to training control and measurement\n"
        "- Adjustable incline: Related to physical challenge and strength\n"
        "- Leg strength: Related to physical development and strength\n\n"
        "I can see two main themes:\n"
        "1. Training Mechanics: Factors related to training control, measurement, and consistency\n"
        "2. Strength Factors: Factors related to physical development and muscle building\n\n"
        "Final answer:\n"
        '{"Training Mechanics":["Pace consistency","Heart rate monitoring"],"Strength Factors":["Adjustable incline","Leg strength"]}')}
]

# Few-shot for pruning redundant factors
FEW_SHOT_PRUNE = [
    {'role': 'user', 'content': (
        "From these factors in cluster 'Training Methods', remove redundant ones. "
        "Think about which factors are essentially the same concept, then return a JSON array of the remaining unique factors.\n"
        '["Consistent pace", "Steady rhythm", "Heart rate monitoring", "Pulse tracking"]')},
    {'role': 'assistant', 'content': (
        "Let me analyze these factors for redundancy:\n\n"
        "Looking at the factors:\n"
        "- 'Consistent pace' and 'Steady rhythm': These are essentially the same concept - maintaining a regular speed/tempo\n"
        "- 'Heart rate monitoring' and 'Pulse tracking': These are also the same concept - measuring heart rate\n\n"
        "Redundant pairs identified:\n"
        "- 'Consistent pace' vs 'Steady rhythm' → Keep 'Consistent pace' (more specific to running)\n"
        "- 'Heart rate monitoring' vs 'Pulse tracking' → Keep 'Heart rate monitoring' (more comprehensive term)\n\n"
        "Final answer:\n"
        '["Consistent pace","Heart rate monitoring"]')}
]

# JSON correction prompt for retry attempts
JSON_CORRECTION_PROMPT = {
    'role': 'system', 
    'content': (
        "Your previous response was not valid JSON. You can think about your reasoning, "
        "but please end with 'Final answer:' followed by a valid JSON object "
        "with the exact factor name as the key and your choice (Statement1, Statement2, or Both) as the value. "
        "Example format:\n"
        "Final answer:\n"
        "{\"factor name\": \"Statement1\"}"
    )
}


CLUSTER_NAME_PROMPT = [
    {
        'role': 'user',
        'content': (
            "Generate a theme name for these related factors:\n"
            '["energy expenditure", "energy efficiency", "energy transfer efficiency"]'
        )
    },
    {
        'role': 'assistant',
        'content': "Energy Efficiency"
    },
    {
        'role': 'user',
        'content': (
            "Generate a theme name for these related factors:\n"
            '["precision control", "better control (accuracy)", "control and precision"]'
        )
    },
    {
        'role': 'assistant',
        'content': "Control Precision"
    },
]


'''''''MAPPING FEW SHOT PROMPTS'''''''


MAPPING_VOTE_SHOT = [
    {
        "role": "user",
        "content": (
            "Scenario: A student is preparing for final exams.\n"
            "Condition: The student spends more time in the library.\n"
            "Candidate factors: [\"Better time management\", \"More stress\", \"Increased social activities\"]\n\n"
            "Task: Select the factor list that is most reasonably related to the given condition. "
            "Think step by step about each factor's relevance, then provide your selection. "
            "Please respond with your reasoning followed by 'Final answer:' and a JSON object in this exact format: {\"answer\": [\"factor1\", \"factor2\", ...]}"
        )
    },
    {
        "role": "assistant",
        "content": (
            "Let me analyze each factor:\n\n"
            "- \"Better time management\": Spending more time in the library suggests the student is organizing their study schedule better and dedicating focused time to studying. This is directly related.\n"
            "- \"More stress\": While exam preparation can be stressful, spending more time in the library is typically a positive study behavior, not necessarily indicating increased stress.\n"
            "- \"Increased social activities\": Libraries are generally quiet study spaces, not social venues. More library time would likely mean less social activities.\n\n"
            "Final answer:\n"
            "{\"answer\": [\"Better time management\"]}"
        )
    },
    {
        "role": "user",
        "content": (
            "Scenario: A company introduces remote working policies.\n"
            "Condition: Employees can work from home twice a week.\n"
            "Candidate factors: [\"Improved work-life balance\", \"Reduced office costs\", \"More commuting\"]\n\n"
            "Task: Select the factor list that is most reasonably related to the given condition. "
            "Think step by step about each factor's relevance, then provide your selection. "
            "Please respond with your reasoning followed by 'Final answer:' and a JSON object in this exact format: {\"answer\": [\"factor1\", \"factor2\", ...]}"
        )
    },
    {
        "role": "assistant",
        "content": (
            "Let me evaluate each factor:\n\n"
            "- \"Improved work-life balance\": Working from home twice a week allows employees to save commute time, have more flexibility, and better integrate work with personal life. This is directly related.\n"
            "- \"Reduced office costs\": With employees working from home part-time, the company needs less office space, utilities, and resources. This is a logical consequence.\n"
            "- \"More commuting\": This is contradictory - working from home twice a week would actually reduce commuting, not increase it.\n\n"
            "Final answer:\n"
            "{\"answer\": [\"Improved work-life balance\", \"Reduced office costs\"]}"
        )
    }
]


MAPPING_REFLECTION_SHOT = [
    {
        "role": "user",
        "content": (
            "Condition: City implements a bike-sharing program.\n"
            "Initially selected factors: [\"Increased bike usage\", \"Higher car sales\", \"More traffic jams\"]\n\n"
            "Task: Review and keep factors reasonably related to the condition. "
            "Think step by step about each factor's relevance, then 'Final answer:' with a JSON array."
        )
    },
    {
        "role": "assistant",
        "content": (
            "“Increased bike usage” is the direct result of more shared bikes—keep it.\n"
            "“Higher car sales” has no clear link to bike sharing—remove it.\n"
            "“More traffic jams” could occur if road space shifts to bikes and cars interact—keep it.\n\n"
            "Final answer:\n"
            "[\"Increased bike usage\", \"More traffic jams\"]"
        )
    },
    {
        "role": "user",
        "content": (
            "Condition: Students study in a quiet library.\n"
            "Initially selected factors: [\"Better concentration\", \"Distractions from phones\", "
            "\"Improved retention\", \"Reduced social interaction\"]\n\n"
            "Task: Review and keep factors reasonably related to the condition. "
            "Think step by step about each factor's relevance, then 'Final answer:' with a JSON array."
        )
    },
    {
        "role": "assistant",
        "content": (
            "“Better concentration” follows from a quiet space aiding focus—keep it.\n"
            "“Distractions from phones” contradicts the library’s purpose—remove it.\n"
            "“Improved retention” naturally arises from focused study—keep it.\n"
            "“Reduced social interaction” is a likely side-effect of silence—keep it.\n\n"
            "Final answer:\n"
            "[\"Better concentration\", \"Improved retention\", \"Reduced social interaction\"]"
        )
    }
]


generate_para_prob_shots = [
{
  "role": "system",
  "content": "You are an AI assistant whose task is: given a user-provided scenario comparing two statements and a list of factor values, estimate for each factor the probability (a float between 0 and 1) that it supports Statement 1 rather than Statement 2. Return your answer strictly as a JSON object mapping each factor value to its probability. Use 0.5 to indicate a neutral case (factor equally supports both statements), values greater than 0.5 to indicate stronger support for Statement 1, and values less than 0.5 to indicate stronger support for Statement 2."
},
        {
            "role": "user",
            "content": "Given the scenario:\n\"\"\"Comparing the ease of carrying a heavy box upstairs by one person vs. two people.\"\"\"\nFor each of the following factor values, please estimate the probability (a float between 0 and 1)\nthat this factor value supports Statement 1 (i.e. carrying by one person is easier)\nrather than Statement 2 (i.e. carrying by two people is easier).\nReturn a JSON mapping from factor value to probability.\n\nFactor values:\n[\"One person\", \"Two people\", \"Box weight is under 10 kg\", \"Box weight exceeds 30 kg\", \"Staircase is narrow\", \"Staircase is wide\", \"Good coordination required\", \"No coordination needed\"]"
        },
        {
            "role": "assistant",
            "content": "{\n  \"One person\": 0.30,\n  \"Two people\": 0.70,\n  \"Box weight is under 10 kg\": 0.85,\n  \"Box weight exceeds 30 kg\": 0.15,\n  \"Staircase is narrow\": 0.20,\n  \"Staircase is wide\": 0.60,\n  \"Good coordination required\": 0.40,\n  \"No coordination needed\": 0.75\n}"
        },
        {
            "role": "user",
            "content": "Given the scenario:\n\"\"\"Comparing the speed of downloading a 1GB file over Wi-Fi vs. 4G cellular network.\"\"\"\nFor each of the following factor values, please estimate the probability (a float between 0 and 1)\nthat this factor value supports Statement 1 (i.e. downloading over Wi-Fi is faster)\nrather than Statement 2 (i.e. downloading over 4G is faster).\nReturn a JSON mapping from factor value to probability.\n\nFactor values:\n[\"Wi-Fi connection\", \"4G connection\", \"Average Wi-Fi bandwidth ≥ 100 Mbps\", \"Average 4G bandwidth ≤ 20 Mbps\", \"Signal is stable\", \"Signal fluctuates frequently\", \"No data cap\", \"Strict monthly data cap\"]"
        },
        {
            "role": "assistant",
            "content": "{\n  \"Wi-Fi connection\": 0.88,\n  \"4G connection\": 0.12,\n  \"Average Wi-Fi bandwidth ≥ 100 Mbps\": 0.92,\n  \"Average 4G bandwidth ≤ 20 Mbps\": 0.25,\n  \"Signal is stable\": 0.80,\n  \"Signal fluctuates frequently\": 0.30,\n  \"No data cap\": 0.70,\n  \"Strict monthly data cap\": 0.15\n}"
        },
    {
        "role": "user",
        "content": "Given the scenario:\n\"\"\"Comparing reading on a digital screen vs printed paper.\"\"\"\nFor each of the following factor values, please estimate the probability (a float between 0 and 1)\nthat this factor value supports Statement 1 (i.e. reading on screen is better)\nrather than Statement 2 (i.e. reading on paper is better).\nReturn a JSON mapping from factor value to probability.\n\nFactor values:\n[\"High contrast display\", \"High contrast print\", \"Screen glare present\", \"Glare absent\", \"Font size identical\", \"Lighting conditions identical\"]"
    },
    {
        "role": "assistant",
        "content": "{\n"
                   "  \"High contrast display\": 0.60,\n"
                   "  \"High contrast print\": 0.40,\n"
                   "  \"Screen glare present\": 0.30,\n"
                   "  \"Glare absent\": 0.70,\n"
                   "  \"Font size identical\": 0.50,\n"
                   "  \"Lighting conditions identical\": 0.50\n"
                   "}"
    },
    {
        "role": "user",
        "content": "Given the scenario:\n\"\"\"Comparing fuel efficiency of diesel cars vs gasoline cars.\"\"\"\nFor each of the following factor values, please estimate the probability (a float between 0 and 1)\nthat this factor value supports Statement 1 (i.e. diesel cars are more fuel-efficient)\nrather than Statement 2 (i.e. gasoline cars are more fuel-efficient).\nReturn a JSON mapping from factor value to probability.\n\nFactor values:\n[\"Diesel vehicles typically generate more torque\", \"Gasoline vehicles typically generate more torque\", \"Vehicle weight and aerodynamics identical\", \"Fuel price difference negligible\"]"
    },
    {
        "role": "assistant",
        "content": "{\n"
                   "  \"Diesel vehicles typically generate more torque\": 0.65,\n"
                   "  \"Gasoline vehicles typically generate more torque\": 0.35,\n"
                   "  \"Vehicle weight and aerodynamics identical\": 0.50,\n"
                   "  \"Fuel price difference negligible\": 0.50\n"
                   "}"
    }
    ]

generate_para_prob_shots_cot_v0 = [
    {
        "role": "system",
        "content": "Please perform a brief chain-of-thought (step-by-step) before outputting the final JSON:\n\nYou are an AI assistant whose task is: given a user-provided scenario comparing two statements and a list of factor values, estimate for each factor the probability (a float between 0 and 1) that it supports Statement 1 rather than Statement 2. Return your answer strictly as a JSON object mapping each factor value to its probability. Use 0.5 to indicate neutrality, values greater than 0.5 to indicate stronger support for Statement 1, and values less than 0.5 to indicate stronger support for Statement 2."
    },
    {
        "role": "user",
        "content": "Given the scenario:\n\"\"\"Comparing the ease of carrying a heavy box upstairs by one person vs. two people.\"\"\"\nFor each of the following factor values, please estimate the probability (a float between 0 and 1)\nthat this factor value supports Statement 1 (i.e. carrying by one person is easier)\nrather than Statement 2 (i.e. carrying by two people is easier).\nReturn a JSON mapping from factor value to probability.\n\nFactor values:\n[\"One person\", \"Two people\", \"Box weight is under 10 kg\", \"Box weight exceeds 30 kg\", \"Staircase is narrow\", \"Staircase is wide\", \"Good coordination required\", \"No coordination needed\"]"
    },
    {
        "role": "assistant",
        "content": "Thought: A single person has limited strength, so two people generally handle heavy loads better, especially in wide spaces; narrow stairs limit teamwork.\n{\n  \"One person\": 0.30,\n  \"Two people\": 0.70,\n  \"Box weight is under 10 kg\": 0.85,\n  \"Box weight exceeds 30 kg\": 0.15,\n  \"Staircase is narrow\": 0.20,\n  \"Staircase is wide\": 0.60,\n  \"Good coordination required\": 0.40,\n  \"No coordination needed\": 0.75\n}"
    },
    {
        "role": "user",
        "content": "Given the scenario:\n\"\"\"Comparing the speed of downloading a 1GB file over Wi-Fi vs. 4G cellular network.\"\"\"\nFor each of the following factor values, please estimate the probability (a float between 0 and 1)\nthat this factor value supports Statement 1 (i.e. downloading over Wi-Fi is faster)\nrather than Statement 2 (i.e. downloading over 4G is faster).\nReturn a JSON mapping from factor value to probability.\n\nFactor values:\n[\"Wi-Fi connection\", \"4G connection\", \"Average Wi-Fi bandwidth ≥ 100 Mbps\", \"Average 4G bandwidth ≤ 20 Mbps\", \"Signal is stable\", \"Signal fluctuates frequently\", \"No data cap\", \"Strict monthly data cap\"]"
    },
    {
        "role": "assistant",
        "content": "Thought: Wi-Fi typically offers higher and more stable throughput, and no data cap avoids throttling or interruptions.\n{\n  \"Wi-Fi connection\": 0.88,\n  \"4G connection\": 0.12,\n  \"Average Wi-Fi bandwidth ≥ 100 Mbps\": 0.92,\n  \"Average 4G bandwidth ≤ 20 Mbps\": 0.25,\n  \"Signal is stable\": 0.80,\n  \"Signal fluctuates frequently\": 0.30,\n  \"No data cap\": 0.70,\n  \"Strict monthly data cap\": 0.15\n}"
    },
    {
        "role": "user",
        "content": "Given the scenario:\n\"\"\"Comparing reading on a digital screen vs printed paper.\"\"\"\nFor each of the following factor values, please estimate the probability (a float between 0 and 1)\nthat this factor value supports Statement 1 (i.e. reading on screen is better)\nrather than Statement 2 (i.e. reading on paper is better).\nReturn a JSON mapping from factor value to probability.\n\nFactor values:\n[\"High contrast display\", \"High contrast print\", \"Screen glare present\", \"Glare absent\", \"Font size identical\", \"Lighting conditions identical\"]"
    },
    {
        "role": "assistant",
        "content": "Thought: High-contrast screens can aid legibility, but glare disadvantages screens while paper remains glare-free.\n{\n  \"High contrast display\": 0.60,\n  \"High contrast print\": 0.40,\n  \"Screen glare present\": 0.30,\n  \"Glare absent\": 0.70,\n  \"Font size identical\": 0.50,\n  \"Lighting conditions identical\": 0.50\n}"
    },
    {
        "role": "user",
        "content": "Given the scenario:\n\"\"\"Comparing fuel efficiency of diesel cars vs gasoline cars.\"\"\"\nFor each of the following factor values, please estimate the probability (a float between 0 and 1)\nthat this factor value supports Statement 1 (i.e. diesel cars are more fuel-efficient)\nrather than Statement 2 (i.e. gasoline cars are more fuel-efficient).\nReturn a JSON mapping from factor value to probability.\n\nFactor values:\n[\"Diesel vehicles typically generate more torque\", \"Gasoline vehicles typically generate more torque\", \"Vehicle weight and aerodynamics identical\", \"Fuel price difference negligible\"]"
    },
    {
        "role": "assistant",
        "content": "Thought: Diesels generally have higher thermal efficiency and torque, but with identical weight/aero and fuel cost, the efficiency edge narrows.\n{\n  \"Diesel vehicles typically generate more torque\": 0.65,\n  \"Gasoline vehicles typically generate more torque\": 0.35,\n  \"Vehicle weight and aerodynamics identical\": 0.50,\n  \"Fuel price difference negligible\": 0.50\n}"
    }
]

generate_para_prob_shots_cot_v1 = [
    {
        "role": "user",
        "content": """Given the scenario:
"Comparing working from home vs working in the office."
For each of the following factor values, please estimate the probability (a float between 0 and 1)
that this factor value supports Statement 1 (i.e. working from home is more advantageous)
rather than Statement 2 (i.e. working in the office is more advantageous).
Return a JSON mapping from factor value to probability.

Factor values:
["Commute time eliminated", "Structured schedule", "Flexible hours", "Office social interactions", "Household distractions", "Professional equipment at office", "Self-discipline required", "Access to office networking", "Ergonomic workstation", "Stability of home internet"]"""
    },
    {
        "role": "assistant",
        "content": """Thought: Eliminating the commute and having flexible hours strongly favor working from home; conversely, a structured schedule, professional equipment, social interactions, and networking opportunities favor the office. Household distractions and internet stability can tip either way depending on the home environment.
{
  "Commute time eliminated": 0.90,
  "Structured schedule": 0.30,
  "Flexible hours": 0.85,
  "Office social interactions": 0.25,
  "Household distractions": 0.40,
  "Professional equipment at office": 0.35,
  "Self-discipline required": 0.45,
  "Access to office networking": 0.20,
  "Ergonomic workstation": 0.40,
  "Stability of home internet": 0.75
}"""
    },
    {
        "role": "user",
        "content": """Given the scenario:
"Comparing LED bulbs vs incandescent bulbs in home lighting."
For each of the following factor values, please estimate the probability (a float between 0 and 1)
that this factor value supports Statement 1 (i.e. LED bulbs are more advantageous)
rather than Statement 2 (i.e. incandescent bulbs are more advantageous).
Return a JSON mapping from factor value to probability.

Factor values:
["Initial cost per bulb", "Energy consumption per hour", "Lifespan hours", "Color rendering index", "Warm color temperature", "Instant full brightness", "Dimmable compatibility", "Mercury content", "Heat generation", "Availability"]"""
    },
    {
        "role": "assistant",
        "content": """Thought: LEDs excel in low energy use, longevity, low heat output, and no mercury, but cost more upfront, may have lower CRI, different warmth, and varying dimming support and availability.
{
  "Initial cost per bulb": 0.30,
  "Energy consumption per hour": 0.95,
  "Lifespan hours": 0.90,
  "Color rendering index": 0.25,
  "Warm color temperature": 0.40,
  "Instant full brightness": 0.50,
  "Dimmable compatibility": 0.35,
  "Mercury content": 0.85,
  "Heat generation": 0.88,
  "Availability": 0.45
}"""
    },
    {
        "role": "user",
        "content": """Given the scenario:
"Comparing convenience of streaming movies online vs renting Blu-ray discs."
For each of the following factor values, please estimate the probability (a float between 0 and 1)
that this factor value supports Statement 1 (i.e. streaming is more convenient)
rather than Statement 2 (i.e. renting Blu-ray is more convenient).
Return a JSON mapping from factor value to probability.

Factor values:
["Internet bandwidth", "Video quality potential (Blu-ray)", "Library availability online", "Physical disc ownership", "Load time vs buffering", "No subscription fee", "Need for disc drive", "Portability of files", "Multiple device streaming", "Resale value of disc"]"""
    },
    {
        "role": "assistant",
        "content": """Thought: Streaming offers instant access, broad libraries, multi-device support, and no hardware requirements, whereas Blu-ray provides top video quality, ownership, no buffering, pay-per-rental simplicity, and resale value.
{
  "Internet bandwidth": 0.80,
  "Video quality potential (Blu-ray)": 0.30,
  "Library availability online": 0.85,
  "Physical disc ownership": 0.25,
  "Load time vs buffering": 0.35,
  "No subscription fee": 0.40,
  "Need for disc drive": 0.75,
  "Portability of files": 0.65,
  "Multiple device streaming": 0.90,
  "Resale value of disc": 0.30
}"""
    }
]

generate_para_prob_shots_with_prior_cot_v1 = [
    {
        "role": "system",
        "content": """You are an expert analyst tasked with evaluating comparative scenarios. When provided with factor values and reference probabilities, your job is to:

    1. Critically analyze each factor's relationship to the competing statements
    2. Consider contextual nuances, edge cases, and varying user perspectives
    3. Apply sophisticated reasoning that goes beyond simple linear adjustments
    4. Account for interdependencies between factors and real-world complexities
    5. Provide probability estimates that reflect thoughtful consideration of multiple viewpoints
    6. Justify your reasoning with specific examples and scenarios when helpful

    Your probability estimates should be well-calibrated, diverse, and based on comprehensive analysis rather than mechanical adjustments to reference values. Consider both the strength and directionality of each factor's influence."""
    },
    {
        "role": "user",
        "content": """Given the scenario: Comparing working from home vs working in the office.

For each of the following factor values, please estimate the probability (a float between 0 and 1) that it supports Statement 1: 'working from home is more advantageous' rather than Statement 2: 'working in the office is more advantageous'.

As reference (but not absolute guidance), here are some initial probability estimates:
  Commute time eliminated: 0.75
  Structured schedule: 0.45
  Flexible hours: 0.70
  Office social interactions: 0.35
  Household distractions: 0.25
  Professional equipment at office: 0.40
  Self-discipline required: 0.50
  Access to office networking: 0.30
  Ergonomic workstation: 0.55
  Stability of home internet: 0.60

Please consider these reference values but feel free to adjust them based on your own analysis. Think about the logical relationship between each factor and the statements, and provide your reasoned probability estimates.

Return a JSON mapping from factor value to probability.
Factor values: ["Commute time eliminated", "Structured schedule", "Flexible hours", "Office social interactions", "Household distractions", "Professional equipment at office", "Self-discipline required", "Access to office networking", "Ergonomic workstation", "Stability of home internet"]"""
    },
    {
        "role": "assistant",
        "content": """Thought: Looking at the reference values and analyzing each factor:
- "Commute time eliminated" strongly favors home work (reference: 0.75) - I'll increase this to 0.90 as it's a clear advantage
- "Structured schedule" slightly favors office (reference: 0.45) - I'll adjust to 0.30 as office structure is more reliable
- "Flexible hours" strongly favors home (reference: 0.70) - I'll increase to 0.85 as this is a key home advantage
- "Office social interactions" favors office (reference: 0.35) - I'll decrease to 0.25 as this clearly benefits office work
- "Household distractions" challenges home work (reference: 0.25) - I'll adjust to 0.40 as impact varies by person
- "Professional equipment at office" favors office (reference: 0.40) - I'll decrease to 0.35 as office equipment is typically superior
- "Self-discipline required" is neutral (reference: 0.50) - I'll adjust to 0.45 as it's slightly more challenging at home
- "Access to office networking" favors office (reference: 0.30) - I'll decrease to 0.20 as networking is better in person
- "Ergonomic workstation" is mixed (reference: 0.55) - I'll adjust to 0.40 as office setups are often better
- "Stability of home internet" affects home work (reference: 0.60) - I'll increase to 0.75 as good internet strongly enables home work

{
  "Commute time eliminated": 0.90,
  "Structured schedule": 0.30,
  "Flexible hours": 0.85,
  "Office social interactions": 0.25,
  "Household distractions": 0.40,
  "Professional equipment at office": 0.35,
  "Self-discipline required": 0.45,
  "Access to office networking": 0.20,
  "Ergonomic workstation": 0.40,
  "Stability of home internet": 0.75
}"""
    },
    {
        "role": "user",
        "content": """Given the scenario: Comparing LED bulbs vs incandescent bulbs in home lighting.

For each of the following factor values, please estimate the probability (a float between 0 and 1) that it supports Statement 1: 'LED bulbs are more advantageous' rather than Statement 2: 'incandescent bulbs are more advantageous'.

As reference (but not absolute guidance), here are some initial probability estimates:
  Initial cost per bulb: 0.40
  Energy consumption per hour: 0.80
  Lifespan hours: 0.85
  Color rendering index: 0.45
  Warm color temperature: 0.35
  Instant full brightness: 0.55
  Dimmable compatibility: 0.50
  Mercury content: 0.75
  Heat generation: 0.70
  Availability: 0.60

Please consider these reference values but feel free to adjust them based on your own analysis. Think about the logical relationship between each factor and the statements, and provide your reasoned probability estimates.

Return a JSON mapping from factor value to probability.
Factor values: ["Initial cost per bulb", "Energy consumption per hour", "Lifespan hours", "Color rendering index", "Warm color temperature", "Instant full brightness", "Dimmable compatibility", "Mercury content", "Heat generation", "Availability"]"""
    },
    {
        "role": "assistant",
        "content": """Thought: Analyzing each factor against the reference values:
- "Initial cost per bulb" disadvantages LEDs (reference: 0.40) - I'll decrease to 0.30 as LEDs cost significantly more upfront
- "Energy consumption per hour" strongly favors LEDs (reference: 0.80) - I'll increase to 0.95 as this is LEDs' biggest advantage
- "Lifespan hours" strongly favors LEDs (reference: 0.85) - I'll adjust to 0.90 as LEDs last much longer
- "Color rendering index" may favor incandescent (reference: 0.45) - I'll decrease to 0.25 as incandescent often has better CRI
- "Warm color temperature" may favor incandescent (reference: 0.35) - I'll adjust to 0.40 as both can be warm, but incandescent is naturally warm
- "Instant full brightness" is mixed (reference: 0.55) - I'll keep at 0.50 as both have trade-offs
- "Dimmable compatibility" is variable (reference: 0.50) - I'll decrease to 0.35 as not all LEDs dim well
- "Mercury content" favors LEDs (reference: 0.75) - I'll increase to 0.85 as LEDs contain no mercury
- "Heat generation" favors LEDs (reference: 0.70) - I'll increase to 0.88 as LEDs generate much less heat
- "Availability" is mixed (reference: 0.60) - I'll decrease to 0.45 as incandescent still widely available

{
  "Initial cost per bulb": 0.30,
  "Energy consumption per hour": 0.95,
  "Lifespan hours": 0.90,
  "Color rendering index": 0.25,
  "Warm color temperature": 0.40,
  "Instant full brightness": 0.50,
  "Dimmable compatibility": 0.35,
  "Mercury content": 0.85,
  "Heat generation": 0.88,
  "Availability": 0.45
}"""
    },
    {
        "role": "user",
        "content": """Given the scenario: Comparing convenience of streaming movies online vs renting Blu-ray discs.

For each of the following factor values, please estimate the probability (a float between 0 and 1) that it supports Statement 1: 'streaming is more convenient' rather than Statement 2: 'renting Blu-ray is more convenient'.

As reference (but not absolute guidance), here are some initial probability estimates:
  Internet bandwidth: 0.70
  Video quality potential (Blu-ray): 0.45
  Library availability online: 0.75
  Physical disc ownership: 0.40
  Load time vs buffering: 0.50
  No subscription fee: 0.35
  Need for disc drive: 0.80
  Portability of files: 0.55
  Multiple device streaming: 0.85
  Resale value of disc: 0.45

Please consider these reference values but feel free to adjust them based on your own analysis. Think about the logical relationship between each factor and the statements, and provide your reasoned probability estimates.

Return a JSON mapping from factor value to probability.
Factor values: ["Internet bandwidth", "Video quality potential (Blu-ray)", "Library availability online", "Physical disc ownership", "Load time vs buffering", "No subscription fee", "Need for disc drive", "Portability of files", "Multiple device streaming", "Resale value of disc"]"""
    },
    {
        "role": "assistant",
        "content": """Thought: Evaluating convenience factors with reference to initial estimates:
- "Internet bandwidth" affects streaming (reference: 0.70) - I'll increase to 0.80 as good bandwidth makes streaming very convenient
- "Video quality potential (Blu-ray)" favors Blu-ray (reference: 0.45) - I'll decrease to 0.30 as Blu-ray has superior quality
- "Library availability online" favors streaming (reference: 0.75) - I'll increase to 0.85 as streaming has vast libraries
- "Physical disc ownership" favors Blu-ray (reference: 0.40) - I'll decrease to 0.25 as ownership is a Blu-ray advantage
- "Load time vs buffering" is mixed (reference: 0.50) - I'll decrease to 0.35 as buffering can be inconvenient
- "No subscription fee" favors Blu-ray (reference: 0.35) - I'll increase to 0.40 as pay-per-rental can be more economical
- "Need for disc drive" favors streaming (reference: 0.80) - I'll decrease to 0.75 as not needing hardware is convenient for streaming
- "Portability of files" favors streaming (reference: 0.55) - I'll increase to 0.65 as digital files are more portable
- "Multiple device streaming" strongly favors streaming (reference: 0.85) - I'll increase to 0.90 as this is a key streaming advantage
- "Resale value of disc" favors Blu-ray (reference: 0.45) - I'll decrease to 0.30 as discs retain some value

{
  "Internet bandwidth": 0.80,
  "Video quality potential (Blu-ray)": 0.30,
  "Library availability online": 0.85,
  "Physical disc ownership": 0.25,
  "Load time vs buffering": 0.35,
  "No subscription fee": 0.40,
  "Need for disc drive": 0.75,
  "Portability of files": 0.65,
  "Multiple device streaming": 0.90,
  "Resale value of disc": 0.30
}"""
    }
]


LLM_LATENT_SHOTS = [
    # System instruction
    {
        "role": "system",
        "content": (
            "You are an AI assistant tasked with identifying latent variables and assigning "
            "each latent only factors drawn from the provided list. Do NOT output any edges. "
            "Return a JSON object with a single field:\n"
            "  latents: an array of objects, each with:\n"
            "    - name: string\n"
            "    - factors: array of strings (each chosen from the provided Factors list)\n"
            "Ensure the JSON parses correctly and strictly follows this schema."
        )
    },
    # Example 1 user dialogue
    {
        "role": "user",
        "content": (
            "Please identify latent variables and assign each factor to a latent. "
            "Then return JSON with fields:\n"
            "  latents: [{\"name\": string, \"factors\": [...]}, ...],\n"
            "Factors: [\"Nutrition\", \"Vitamins\", \"Taste\", \"Convenience\"]"
        )
    },
    # Example 1 assistant output
    {
        "role": "assistant",
        "content": (
            "{\"latents\":["
            "{\"name\":\"HealthLat\",\"factors\":[\"Nutrition\",\"Vitamins\"]},"
            "{\"name\":\"EnjoyLat\",\"factors\":[\"Taste\",\"Convenience\"]}"
            "]}"
        )
    },
    # Example 2 user dialogue
    {
        "role": "user",
        "content": (
            "Please identify latent variables and assign each factor to a latent. "
            "Then return JSON with fields:\n"
            "  latents: [{\"name\": string, \"factors\": [...]}, ...],\n"
            "Factors: [\"Reward\", \"Risk\", \"TimePressure\", \"Collaboration\"]"
        )
    },
    # Example 2 assistant output
    {
        "role": "assistant",
        "content": (
            "{\"latents\":["
            "{\"name\":\"BenefitLat\",\"factors\":[\"Reward\"]},"
            "{\"name\":\"CostLat\",\"factors\":[\"Risk\",\"TimePressure\"]},"
            "{\"name\":\"SocialLat\",\"factors\":[\"Collaboration\"]}"
            "]}"
        )
    }
]

LLM_LATENT_SHOTS_COT = [
    # System instruction, requiring brief reasoning first, then final JSON output
    {
        "role": "system",
        "content": (
            "Please perform a brief chain-of-thought (step-by-step reasoning) before outputting the final JSON.\n\n"
            "You are an AI assistant tasked with identifying latent variables and assigning each latent only factors drawn from the provided list. "
            "Do NOT output any edges. Return a JSON object with a single field:\n"
            "  latents: an array of objects, each with:\n"
            "    - name: string\n"
            "    - factors: array of strings (each chosen from the provided Factors list)\n"
            "Ensure the JSON parses correctly and strictly follows this schema."
        )
    },
    # Example 1 user dialogue
    {
        "role": "user",
        "content": (
            "Please identify latent variables and assign each factor to a latent. "
            "Then return JSON with fields:\n"
            "  latents: [{\"name\": string, \"factors\": [...]}, ...],\n"
            "Factors: [\"Nutrition\", \"Vitamins\", \"Taste\", \"Convenience\"]"
        )
    },
    # Example 1 assistant output (with CoT)
    {
        "role": "assistant",
        "content": (
            "Thought: Nutrition and Vitamins both relate to health aspects of food, while Taste and Convenience relate to user enjoyment and practicality.\n"
            "{\n"
            "  \"latents\": [\n"
            "    {\"name\": \"HealthLat\", \"factors\": [\"Nutrition\", \"Vitamins\"]},\n"
            "    {\"name\": \"EnjoyLat\", \"factors\": [\"Taste\", \"Convenience\"]}\n"
            "  ]\n"
            "}"
        )
    },
    # Example 2 user dialogue
    {
        "role": "user",
        "content": (
            "Please identify latent variables and assign each factor to a latent. "
            "Then return JSON with fields:\n"
            "  latents: [{\"name\": string, \"factors\": [...]}, ...],\n"
            "Factors: [\"Reward\", \"Risk\", \"TimePressure\", \"Collaboration\"]"
        )
    },
    # Example 2 assistant output (with CoT)
    {
        "role": "assistant",
        "content": (
            "Thought: Reward represents benefits gained, Risk and TimePressure both represent potential costs or pressures, and Collaboration relates to social interaction.\n"
            "{\n"
            "  \"latents\": [\n"
            "    {\"name\": \"BenefitLat\", \"factors\": [\"Reward\"]},\n"
            "    {\"name\": \"CostLat\", \"factors\": [\"Risk\", \"TimePressure\"]},\n"
            "    {\"name\": \"SocialLat\", \"factors\": [\"Collaboration\"]}\n"
            "  ]\n"
            "}"
        )
    },
    # Example 3 user dialogue
    {
        "role": "user",
        "content": (
            "Please identify latent variables and assign each factor to a latent. "
            "Then return JSON with fields:\n"
            "  latents: [{\"name\": string, \"factors\": [...]}, ...],\n"
            "Factors: [\"Speed\", \"Accuracy\", \"Cost\", \"Scalability\"]"
        )
    },
    # Example 3 assistant output (with CoT)
    {
        "role": "assistant",
        "content": (
            "Thought: Speed, Accuracy, and Scalability all describe performance-related characteristics, while Cost describes resource expenditure.\n"
            "{\n"
            "  \"latents\": [\n"
            "    {\"name\": \"PerformanceLat\", \"factors\": [\"Speed\", \"Accuracy\", \"Scalability\"]},\n"
            "    {\"name\": \"CostLat\", \"factors\": [\"Cost\"]}\n"
            "  ]\n"
            "}"
        )
    },
    # Example 4 user dialogue
    {
        "role": "user",
        "content": (
            "Please identify latent variables and assign each factor to a latent. "
            "Then return JSON with fields:\n"
            "  latents: [{\"name\": string, \"factors\": [...]}, ...],\n"
            "Factors: [\"Usability\", \"Security\", \"Maintainability\", \"Portability\", \"Reliability\"]"
        )
    },
    # Example 4 assistant output (with CoT)
    {
        "role": "assistant",
        "content": (
            "Thought: Usability and Portability focus on user experience and access, Reliability and Maintainability focus on software quality over time, and Security is a distinct concern.\n"
            "{\n"
            "  \"latents\": [\n"
            "    {\"name\": \"UXLat\", \"factors\": [\"Usability\", \"Portability\"]},\n"
            "    {\"name\": \"QualityLat\", \"factors\": [\"Reliability\", \"Maintainability\"]},\n"
            "    {\"name\": \"SecurityLat\", \"factors\": [\"Security\"]}\n"
            "  ]\n"
            "}"
        )
    }
]


GENERATE_LATENT_PROBS_SHOTS = [
    {
        "role": "system",
        "content": (
            "Please perform a brief chain-of-thought (step-by-step) before outputting the final JSON:\n\n"
            "You are an AI assistant. You will be given:\n"
            "  • A list of latents where each latent has:\n"
            "    – its name\n"
            "    – the list of factor descriptions it groups\n"
            "  • Two competing statements (Statement1 vs. Statement2)\n\n"
            "Your task: Think through the semantic content of the factors relative to the statements, "
            "and estimate for each latent a probability pair [p1, p0]:\n"
            "  • p1 = probability the latent supports Statement1\n"
            "  • p0 = probability the latent supports Statement2\n\n"
            "Begin your answer with “Thought:” to show your reasoning, then output exactly a JSON object "
            "mapping each latent name to its [p1, p0] (no extra text)."
        )
    },
    # Example 1
    {
        "role": "user",
        "content": (
            "Latents with factors:\n"
            "[\n"
            "  {\"name\":\"Performance\",\"factors\":[\"Faster processing\",\"Efficient resource use\"]},\n"
            "  {\"name\":\"Stability\",\"factors\":[\"Crash reports\",\"Memory leaks\"]}\n"
            "]\n\n"
            "Statement1: The system improves performance.\n"
            "Statement2: The system does not improve performance."
        )
    },
    {
        "role": "assistant",
        "content": (
            "Thought: The Performance latent groups factors that directly indicate faster and more efficient operation, "
            "which strongly supports Statement1. The Stability latent lists issues that undermine reliability, which indirectly "
            "suggests performance might not improve overall. I’d estimate Performance around [0.85, 0.15] and Stability around [0.30, 0.70].\n"
            "{\n"
            "  \"Performance\": [0.85, 0.15],\n"
            "  \"Stability\":   [0.30, 0.70]\n"
            "}"
        )
    },
    # Example 2
    {
        "role": "user",
        "content": (
            "Latents with factors:\n"
            "[\n"
            "  {\"name\":\"HealthLat\",\"factors\":[\"Nutrition benefits\"]},\n"
            "  {\"name\":\"EnjoyLat\",\"factors\":[\"Taste appeal\",\"Fun presentation\"]}\n"
            "]\n\n"
            "Statement1: Healthy eating is fun.\n"
            "Statement2: Healthy eating is not fun."
        )
    },
    {
        "role": "assistant",
        "content": (
            "Thought: Nutrition benefits relate to health but don’t guarantee fun, so HealthLat feels neutral to slightly positive. "
            "EnjoyLat clearly centers on taste and fun aspects, strongly supporting Statement1. I’d give HealthLat [0.55, 0.45] and EnjoyLat [0.85, 0.15].\n"
            "{\n"
            "  \"HealthLat\": [0.55, 0.45],\n"
            "  \"EnjoyLat\":  [0.85, 0.15]\n"
            "}"
        )
    },
    # Example 3
    {
        "role": "user",
        "content": (
            "Latents with factors:\n"
            "[\n"
            "  {\"name\":\"CostLat\",\"factors\":[\"High price\",\"Subscription fees\"]},\n"
            "  {\"name\":\"ValueLat\",\"factors\":[\"Quality components\",\"Rich feature set\"]}\n"
            "]\n\n"
            "Statement1: The product offers good value for money.\n"
            "Statement2: The product is overpriced."
        )
    },
    {
        "role": "assistant",
        "content": (
            "Thought: CostLat’s factors emphasize expense, which leans toward overpriced, but subscription might add long-term value so moderate. "
            "ValueLat’s factors directly point to quality and rich features, strongly favoring good value. I’d estimate CostLat [0.40, 0.60] "
            "and ValueLat [0.80, 0.20].\n"
            "{\n"
            "  \"CostLat\":  [0.40, 0.60],\n"
            "  \"ValueLat\": [0.80, 0.20]\n"
            "}"
        )
    },
    # Example 4
    {
        "role": "user",
        "content": (
            "Latents with factors:\n"
            "[\n"
            "  {\"name\":\"TrustLat\",\"factors\":[\"Brand reputation\",\"Positive reviews\"]}\n"
            "]\n\n"
            "Statement1: Brand X is trustworthy.\n"
            "Statement2: Brand X is not trustworthy."
        )
    },
    {
        "role": "assistant",
        "content": (
            "Thought: Brand reputation and positive reviews both semantically signal trustworthiness, so this latent strongly supports Statement1. "
            "I’d estimate TrustLat at [0.90, 0.10].\n"
            "{\n"
            "  \"TrustLat\": [0.90, 0.10]\n"
            "}"
        )
    }
]


NEUTRAL_FACTOR_MAPPING_SHOTS = [
{
    'role': 'system',
    'content': (
        'You are an expert in determining whether a given “Condition” reflects a specified “Factor”, taking the provided “Scenario” into account. '
        'For each input (consisting of a Factor, Scenario, and Condition), please analyze how the Scenario influences the relevance of the Factor to the Condition and respond in the following format:\n'
        'Thought:\n'
        '1. <First step of reasoning, referencing the Scenario>\n'
        '2. <Second step of reasoning, showing how the Scenario informs the Factor–Condition relationship>\n'
        '…\n'
        'Match: Yes (if the Condition matches the Factor within this Scenario) or No (if it does not)'
    )
}
,
    {
        'role': 'user',
        'content': 'Factor: "choice of whole fruits"\nScenario: "Comparing healthy dessert options"\nCondition: "John chooses an apple instead of a candy bar for dessert."'
    },
    {
        'role': 'assistant',
        'content': 'Thought:\n1. Identify the dessert options compared: apple vs candy bar.\n2. An apple is a whole fruit, offering more fiber and nutrients than a candy bar.\n3. John explicitly selects the whole fruit option as his dessert.\nMatch: Yes'
    },
    {
        'role': 'user',
        'content': 'Factor: "selection of low-sugar snacks"\nScenario: "Comparing sugar content in snack choices"\nCondition: "John grabs a candy bar instead of unsweetened nuts."'
    },
    {
        'role': 'assistant',
        'content': 'Thought:\n1. Compare sugar content: candy bar (high sugar) vs unsweetened nuts (low sugar).\n2. John’s choice of candy bar indicates selection of a high-sugar snack.\n3. This does not match the factor of selecting low-sugar snacks.\nMatch: No'
    },
    {
        'role': 'user',
        'content': 'Factor: "selection of plant-based proteins"\nScenario: "Evaluating protein source in soup options"\nCondition: "John opts for lentil soup rather than chicken noodle soup."'
    },
    {
        'role': 'assistant',
        'content': 'Thought:\n1. Lentil soup provides plant-based protein; chicken noodle soup provides animal-based protein.\n2. John explicitly chooses the lentil soup.\n3. This aligns with selecting a plant-based protein option.\nMatch: Yes'
    },
    {
        'role': 'user',
        'content': 'Factor: "preference for grilled foods"\nScenario: "Comparing cooking methods for seafood"\nCondition: "John orders fried shrimp instead of grilled shrimp at the restaurant."'
    },
    {
        'role': 'assistant',
        'content': 'Thought:\n1. The choice is between fried shrimp and grilled shrimp.\n2. Fried shrimp involves deep-frying, not grilling.\n3. John’s choice of fried shrimp does not reflect a preference for grilled foods.\nMatch: No'
    },
    {
        'role': 'user',
        'content': 'Factor: "use of smaller plates"\nScenario: "Implementing portion control strategies"\nCondition: "John serves his dinner on small appetizer plates to control portion size."'
    },
    {
        'role': 'assistant',
        'content': 'Thought:\n1. John uses small appetizer plates for his dinner servings.\n2. Smaller plates are a known tool for reducing portion sizes.\n3. This directly fits the factor of using smaller plates for portion control.\nMatch: Yes'
    }
]

NEUTRAL_FACTOR_PRUNE_SHOTS =[
    {
        "role": "system",
        "content": "You are a model that helps in pruning and deduplicating factor lists based on a given scenario. Your task is to review a list of factors and remove any redundancy while ensuring that all remaining factors are directly relevant to the provided claim and scenario. After thinking critically about the factors, output the final pruned list of unique and relevant factors.\n\n### Steps for Pruning:\n1. **Understand the Scenario and Condition**: Begin by reviewing the scenario and condition provided. The scenario is the central claim or statement, and the condition provides specific details or assumptions that may influence the relevance of the factors.\n\n2. **Evaluate Factor Relevance**: Examine each factor in the list to determine how closely it relates to the scenario and condition. Factors that are not directly related or don't affect the claim should be removed.\n\n3. **Eliminate Redundant Factors**: Identify factors that essentially describe the same concept or influence and remove the duplicates. For example, factors that overlap in meaning should be considered redundant.\n\n4. **Refine and Keep Only Critical Factors**: Retain factors that are most specific and essential for supporting or explaining the claim. Avoid keeping general or vague factors that don't add clear value to the context.\n\n5. **Output the Final List**: After carefully pruning the list, output the remaining factors in a concise and clear format.\n\n### Response Format:\nThe output should be a **list** of pruned factors, with each factor as a separate string in the list. The list should be formatted as follows:\n\n[\n    \"<factor1>\",\n    \"<factor2>\",\n    \"<factor3>\",\n    ...\n]\n\nEnsure the list contains only relevant, unique, and non-redundant factors."
    }
    ,
    {
        "role": "user",
        "content": "Scenario: A study claims that regular physical activity can reduce the risk of heart disease.\n\nCondition: The individual regularly engages in moderate aerobic exercise for 30 minutes a day.\n\nFactors to consider: ['heart rate improvement', 'blood pressure reduction', 'cholesterol levels', 'dietary habits', 'exercise frequency', 'exercise intensity', 'genetic predisposition', 'physical endurance', 'stress reduction', 'weight loss', 'smoking habits', 'age', 'family history', 'physical activity duration']"
    },
        {
            "role": "assistant",
            "content": "Let me systematically analyze these factors for their relevance to reducing heart disease risk through regular physical activity. I'll categorize them first:\n\nDirect physiological impacts: heart rate improvement, blood pressure reduction, cholesterol levels\nExercise parameters: frequency, intensity, duration\nSecondary benefits: stress reduction, weight loss\n\nI notice some redundancy - 'heart rate improvement' is actually encompassed within broader cardiovascular benefits. Let me consolidate and prioritize based on evidence strength and direct causal relationship.\n\nPruned list of relevant factors:\n['blood pressure reduction', 'cholesterol levels', 'exercise frequency', 'exercise intensity', 'stress reduction', 'weight loss', 'physical activity duration']"
        },
        {
            "role": "user",
            "content": "Scenario: It is claimed that using smartphones for long periods can negatively affect sleep quality.\n\nCondition: The individual uses their smartphone for over 6 hours daily, including late-night use.\n\nFactors to consider: ['blue light exposure', 'sleep duration', 'sleep environment', 'screen brightness', 'stress levels', 'social media use', 'anxiety', 'sleep schedule', 'sleep disorders', 'smartphone addiction', 'caffeine consumption', 'time of day using smartphone']"
        },
        {
            "role": "assistant",
            "content": "I need to think about the causal chain here. The claim focuses specifically on smartphone use → sleep quality. Let me trace the mechanisms:\n\n1. Blue light exposure → melatonin suppression → sleep disruption\n2. Late-night stimulation → delayed sleep onset\n3. Addictive usage patterns → sleep schedule disruption\n\nSome factors like 'caffeine consumption' and general 'sleep disorders' are confounding variables rather than mediators of smartphone effects. 'Sleep duration' is more of an outcome than a mediating factor. I should focus on the pathway-specific factors.\n\nHowever, I'm keeping 'sleep environment' as smartphone use in bed can alter the sleep setting.\n\nPruned list of relevant factors:\n['blue light exposure', 'time of day using smartphone', 'screen brightness', 'social media use', 'sleep schedule', 'smartphone addiction']"
        },
        {
            "role": "user",
            "content": "Scenario: It's been suggested that air pollution increases the likelihood of respiratory diseases.\n\nCondition: The individual lives in an area with high levels of particulate matter and smog.\n\nFactors to consider: ['particulate matter', 'smoking habits', 'urbanization', 'pollution levels', 'exposure frequency', 'age', 'physical health', 'immune system strength', 'weather conditions', 'socioeconomic status', 'indoor air quality']"
        },
        {
            "role": "assistant",
            "content": "This requires distinguishing between direct pollution factors, vulnerability modifiers, and confounders. Let me approach this using a risk assessment framework:\n\nExposure factors: particulate matter, pollution levels (these overlap - I'll keep the more specific 'particulate matter')\nExposure patterns: exposure frequency, indoor air quality\nVulnerability factors: age, immune system strength, physical health\nConfounders: smoking habits, socioeconomic status\n\nI'm torn on whether to include vulnerability factors. While age and immune strength genuinely modify pollution's impact, they're not part of the direct causal pathway. For this pruning exercise focusing on the core claim, I'll prioritize exposure-related factors but include one key vulnerability modifier.\n\nWeather conditions affect pollution dispersion and concentration, making it relevant.\n\nPruned list of relevant factors:\n['particulate matter', 'exposure frequency', 'weather conditions', 'indoor air quality', 'immune system strength']"
        },
        {
            "role": "user",
            "content": "Scenario: A new study suggests that sleep deprivation can lead to cognitive impairments.\n\nCondition: The individual gets less than 5 hours of sleep each night for several weeks.\n\nFactors to consider: ['sleep duration', 'cognitive performance', 'memory recall', 'attention span', 'stress levels', 'mood swings', 'mental clarity', 'physical health', 'age', 'diet', 'mental health history', 'exercise routine']"
        },
        {
            "role": "assistant",
            "content": "Here I need to separate the exposure variable, outcome measures, and mediating factors. This is trickier because some factors could be both outcomes and mediators.\n\nExposure: sleep duration (the key independent variable)\nCognitive outcomes: cognitive performance, memory recall, attention span, mental clarity\nPotential mediators: stress levels, mood swings\nBackground factors: age, physical health, diet, mental health history, exercise routine\n\nInterestingly, 'cognitive performance' is quite broad and might encompass memory recall, attention span, and mental clarity. I should consider whether to keep the general term or the specific components.\n\nI'll keep both general and specific cognitive measures since they represent different assessment approaches. Stress levels could be both a consequence of sleep deprivation and a mediator of cognitive effects, so it's worth including.\n\nPruned list of relevant factors:\n['sleep duration', 'cognitive performance', 'memory recall', 'attention span', 'mental clarity', 'stress levels']"
        },
        {
            "role": "user",
            "content": "Scenario: It is believed that consuming too much sugar may increase the risk of obesity.\n\nCondition: The individual consumes more than 50 grams of sugar daily from processed foods and beverages.\n\nFactors to consider: ['sugar intake', 'caloric intake', 'dietary habits', 'physical activity', 'insulin resistance', 'body mass index (BMI)', 'metabolic rate', 'genetic predisposition', 'age', 'stress levels', 'portion control', 'nutrition labels']"
        },
        {
            "role": "assistant",
            "content": "This obesity risk scenario involves complex energy balance dynamics. Let me think through the mechanistic pathways:\n\nPrimary exposure: sugar intake\nBroader context: caloric intake, dietary habits, portion control\nEnergy expenditure: physical activity, metabolic rate\nMetabolic consequences: insulin resistance\nOutcome measure: BMI\nModifying factors: genetic predisposition, age\n\nI'm debating whether 'caloric intake' is redundant with 'sugar intake' - but excess sugar often leads to excess calories through other foods too, so it's worth keeping. 'Dietary habits' and 'portion control' overlap significantly - I'll keep the more specific 'portion control'.\n\n'Nutrition labels' seems peripheral - it's about awareness, not the biological pathway. 'Stress levels' could affect eating behavior but isn't central to the sugar→obesity mechanism.\n\nPruned list of relevant factors:\n['sugar intake', 'caloric intake', 'physical activity', 'insulin resistance', 'body mass index (BMI)', 'metabolic rate', 'portion control']"
        },
        {
            "role": "user",
            "content": "Scenario: A claim is made that excessive screen time can lead to poor posture.\n\nCondition: The individual spends over 8 hours a day sitting in front of a screen.\n\nFactors to consider: ['screen time duration', 'sitting posture', 'ergonomics', 'screen placement', 'age', 'muscle strain', 'work environment', 'exercise habits', 'age', 'mental health', 'stress', 'physical fitness']"
        },
        {
            "role": "assistant",
            "content": "I notice 'age' is listed twice - that's clearly a data entry error. Let me think about this biomechanical problem using a different analytical approach.\n\nUsing a temporal causation framework:\nAntecedent factors: screen time duration, work environment setup\nImmediate factors: sitting posture, ergonomics, screen placement\nConsequent factors: muscle strain\nModifying factors: age, exercise habits, physical fitness\nUnrelated factors: mental health, stress\n\nThere's an interesting relationship here - 'sitting posture' could be both a mediator (screen time → poor sitting posture → problems) and an outcome measure. 'Muscle strain' is likely a consequence rather than a contributing factor.\n\nFor this pruning task, I should focus on the environmental and behavioral factors that create the posture problem, rather than its consequences.\n\nPruned list of relevant factors:\n['screen time duration', 'ergonomics', 'screen placement', 'work environment', 'exercise habits', 'physical fitness']"
        }

]
AGG_SHOTS  = [
    # Shot 1: Completely symmetric support, should judge 3
    {
        "role": "user",
        "content": (
            "Scenario: You need a quick cover in bed.\n"
            "Gold statement: \"A sock is as convenient as a fitted sheet as a blanket.\"\n\n"
            "Sentence 1 (label 1): \"The sock is oversized and warm.\"  \n"
            "Sentence 1 factors: comfort level, thickness  \n\n"
            "Sentence 2 (label 2): \"The sheet wraps easily around you.\"  \n"
            "Sentence 2 factors: coverage, ease of wrapping  \n\n"
            "NB probabilities -> [1]: 0.52, [2]: 0.48  \n"
            "CBN probabilities -> [1]: 0.50, [2]: 0.50  \n\n"
            "CBN predicted label: 3  (1=Sentence1; 2=Sentence2; 3=equal)\n\n"
            "Please determine whether the CBN prediction is correct.\n"
            "Think step by step, then output exactly:\n\n"
            "CORRECT: True\n"
            "or\n"
            "CORRECT: False\n\n"
            "No other text."
        )
    },
    {
        "role": "assistant",
        "content": (
            "The two sentences have nearly identical factor strength and type—both emphasize warmth and fit. "
            "NB’s slight tilt is due to independent‐factor summation. "
            "CBN recognizes their symmetry and outputs equal support.\n\n"
            "CORRECT: True"
        )
    },

    # Shot 2: Key factors stronger, CBN tendency correct
    {
        "role": "user",
        "content": (
            "Scenario: Comparing coats vs sweaters outdoors.\n"
            "Gold statement: \"A sweater keeps you warmer than a coat in mild cold.\"\n\n"
            "Sentence 1 (label 1): \"The coat has a hood and windproof shell.\"  \n"
            "Sentence 1 factors: hood, windproof, multiple layers  \n\n"
            "Sentence 2 (label 2): \"The sweater is thick wool with high insulation.\"  \n"
            "Sentence 2 factors: insulation, wool thickness, snug fit  \n\n"
            "NB probabilities -> [1]: 0.88, [2]: 0.91  \n"
            "CBN probabilities -> [1]: 0.45, [2]: 0.75  \n\n"
            "CBN predicted label: 2  (1=Sentence1; 2=Sentence2; 3=equal)\n\n"
            "Please determine whether the CBN prediction is correct.\n"
            "Think step by step, then output exactly:\n\n"
            "CORRECT: True\n"
            "or\n"
            "CORRECT: False\n\n"
            "No other text."
        )
    },
    {
        "role": "assistant",
        "content": (
            "Although NB sums more windproof features, the core insulation factor in the sweater "
            "is causally more decisive for mild cold. CBN’s structure highlights this key factor.\n\n"
            "CORRECT: True"
        )
    },

    # Shot 3: NB clearly biased but causal network conservatively judges equal
    {
        "role": "user",
        "content": (
            "Scenario: Changing a low lamp bulb.\n"
            "Gold statement: \"Standing on a pillow is as easy as standing on a ladder.\"\n\n"
            "Sentence 1 (label 1): \"The ladder is stable with wide feet.\"  \n"
            "Sentence 1 factors: stability, safety  \n\n"
            "Sentence 2 (label 2): \"The pillow is thick and raises you sufficiently.\"  \n"
            "Sentence 2 factors: thickness, ease of lifting  \n\n"
            "NB probabilities -> [1]: 0.65, [2]: 0.35  \n"
            "CBN probabilities -> [1]: 0.50, [2]: 0.50  \n\n"
            "CBN predicted label: 3  (1=Sentence1; 2=Sentence2; 3=equal)\n\n"
            "Please determine whether the CBN prediction is correct.\n"
            "Think step by step, then output exactly:\n\n"
            "CORRECT: True\n"
            "or\n"
            "CORRECT: False\n\n"
            "No other text."
        )
    },
    {
        "role": "assistant",
        "content": (
            "NB’s independence assumption overweights ladder safety, "
            "but real‐world ease is similar for low heights. "
            "CBN balances overlapping safety and raise factors to equal support.\n\n"
            "CORRECT: True"
        )
    },

    # Shot 4: CBN misjudges as equal but NB more reliable
    {
        "role": "user",
        "content": (
            "Scenario: Eating soup vs salad.\n"
            "Gold statement: \"You need a spoon rather than a fork to eat soup.\"\n\n"
            "Sentence 1 (label 1): \"A spoon holds liquid without spilling.\"  \n"
            "Sentence 1 factors: liquid containment, spill prevention  \n\n"
            "Sentence 2 (label 2): \"A fork can stab solid pieces.\"  \n"
            "Sentence 2 factors: spearing, solid handling  \n\n"
            "NB probabilities -> [1]: 0.95, [2]: 0.05  \n"
            "CBN probabilities -> [1]: 0.50, [2]: 0.50  \n\n"
            "CBN predicted label: 3  (1=Sentence1; 2=Sentence2; 3=equal)\n\n"
            "Please determine whether the CBN prediction is correct.\n"
            "Think step by step, then output exactly:\n\n"
            "CORRECT: True\n"
            "or\n"
            "CORRECT: False\n\n"
            "No other text."
        )
    },
    {
        "role": "assistant",
        "content": (
            "CBN’s equal‐support is too conservative here. "
            "The task requires liquid containment, which only the spoon provides. "
            "NB correctly captures this decisive factor.\n\n"
            "CORRECT: False"
        )
    },

    # Shot 5: NB biased wrong but CBN prediction wrong
    {
        "role": "user",
        "content": (
            "Scenario: Choosing pen vs pencil for permanent marking.\n"
            "Gold statement: \"A permanent marker works better than a pencil for lasting marks.\"\n\n"
            "Sentence 1 (label 1): \"The pen uses fade-resistant permanent ink.\"  \n"
            "Sentence 1 factors: permanence, ink quality  \n\n"
            "Sentence 2 (label 2): \"The pencil leaves graphite marks that can smudge.\"  \n"
            "Sentence 2 factors: smudge risk, erasability  \n\n"
            "NB probabilities -> [1]: 0.99, [2]: 0.01  \n"
            "CBN probabilities -> [1]: 0.30, [2]: 0.70  \n\n"
            "CBN predicted label: 2  (1=Sentence1; 2=Sentence2; 3=equal)\n\n"
            "Please determine whether the CBN prediction is correct.\n"
            "Think step by step, then output exactly:\n\n"
            "CORRECT: True\n"
            "or\n"
            "CORRECT: False\n\n"
            "No other text."
        )
    },
    {
        "role": "assistant",
        "content": (
            "CBN over-weights smudge risk without modeling permanence causally. "
            "Permanent ink is far more decisive. NB’s high probability for the pen is correct.\n\n"
            "CORRECT: False"
        )
    }
]
AGG_WO_FACTORS_SHOTS = [
    # Shot 1: Completely symmetric support, should judge 3
    {
        "role": "user",
        "content": (
            "Scenario: You need a quick cover in bed.\n"
            "Gold statement: \"A sock is as convenient as a fitted sheet as a blanket.\"\n\n"
            "Sentence 1 (label 1): \"The sock is oversized and warm.\"\n"
            "Sentence 2 (label 2): \"The sheet wraps easily around you.\"\n\n"
            "NB probabilities -> [1]: 0.52, [2]: 0.48\n"
            "CBN probabilities -> [1]: 0.50, [2]: 0.50\n\n"
            "CBN predicted label: 3  (1=Sentence1; 2=Sentence2; 3=equal)\n\n"
            "Please determine based on the sentence semantics whether the CBN prediction is correct.\n"
            "Think step by step, then output exactly:\n\n"
            "CORRECT: True\n"
            "or\n"
            "CORRECT: False\n\n"
            "No other text."
        )
    },
    {
        "role": "assistant",
        "content": (
            "Both sentences express equal levels of convenience—one highlights warmth, the other emphasizes ease of wrapping. "
            "Semantically, neither has stronger support, so equal support is justified.\n\n"
            "CORRECT: True"
        )
    },

    # Shot 2: Key semantics stronger, CBN tendency correct
    {
        "role": "user",
        "content": (
            "Scenario: Comparing coats vs sweaters outdoors.\n"
            "Gold statement: \"A sweater keeps you warmer than a coat in mild cold.\"\n\n"
            "Sentence 1 (label 1): \"The coat has a hood and windproof shell.\"\n"
            "Sentence 2 (label 2): \"The sweater is thick wool with high insulation.\"\n\n"
            "NB probabilities -> [1]: 0.88, [2]: 0.91\n"
            "CBN probabilities -> [1]: 0.45, [2]: 0.75\n\n"
            "CBN predicted label: 2  (1=Sentence1; 2=Sentence2; 3=equal)\n\n"
            "Please determine based on the sentence semantics whether the CBN prediction is correct.\n"
            "Think step by step, then output exactly:\n\n"
            "CORRECT: True\n"
            "or\n"
            "CORRECT: False\n\n"
            "No other text."
        )
    },
    {
        "role": "assistant",
        "content": (
            "The sweater sentence directly asserts superior insulation, which aligns closely with the gold claim. "
            "Semantically, it offers stronger relevance, so CBN’s choice is correct.\n\n"
            "CORRECT: True"
        )
    },

    # Shot 3: NB clearly biased but semantic network conservatively judges equal
    {
        "role": "user",
        "content": (
            "Scenario: Changing a low lamp bulb.\n"
            "Gold statement: \"Standing on a pillow is as easy as standing on a ladder.\"\n\n"
            "Sentence 1 (label 1): \"The ladder is stable with wide feet.\"\n"
            "Sentence 2 (label 2): \"The pillow is thick and raises you sufficiently.\"\n\n"
            "NB probabilities -> [1]: 0.65, [2]: 0.35\n"
            "CBN probabilities -> [1]: 0.50, [2]: 0.50\n\n"
            "CBN predicted label: 3  (1=Sentence1; 2=Sentence2; 3=equal)\n\n"
            "Please determine based on the sentence semantics whether the CBN prediction is correct.\n"
            "Think step by step, then output exactly:\n\n"
            "CORRECT: True\n"
            "or\n"
            "CORRECT: False\n\n"
            "No other text."
        )
    },
    {
        "role": "assistant",
        "content": (
            "Although the ladder sentence mentions stability, both sentences describe equally sufficient support for low heights. "
            "Their semantic contributions balance out, so equal support is reasonable.\n\n"
            "CORRECT: True"
        )
    },

    # Shot 4: CBN misjudges as equal but NB more reliable
    {
        "role": "user",
        "content": (
            "Scenario: Eating soup vs salad.\n"
            "Gold statement: \"You need a spoon rather than a fork to eat soup.\"\n\n"
            "Sentence 1 (label 1): \"A spoon holds liquid without spilling.\"\n"
            "Sentence 2 (label 2): \"A fork can stab solid pieces.\"\n\n"
            "NB probabilities -> [1]: 0.95, [2]: 0.05\n"
            "CBN probabilities -> [1]: 0.50, [2]: 0.50\n\n"
            "CBN predicted label: 3  (1=Sentence1; 2=Sentence2; 3=equal)\n\n"
            "Please determine based on the sentence semantics whether the CBN prediction is correct.\n"
            "Think step by step, then output exactly:\n\n"
            "CORRECT: True\n"
            "or\n"
            "CORRECT: False\n\n"
            "No other text."
        )
    },
    {
        "role": "assistant",
        "content": (
            "The spoon sentence directly matches the need to contain liquid, which is decisive here. "
            "Semantically, it clearly outperforms the fork sentence, so CBN is incorrect.\n\n"
            "CORRECT: False"
        )
    },

    # Shot 5: NB biased wrong but CBN prediction wrong
    {
        "role": "user",
        "content": (
            "Scenario: Choosing pen vs pencil for permanent marking.\n"
            "Gold statement: \"A permanent marker works better than a pencil for lasting marks.\"\n\n"
            "Sentence 1 (label 1): \"The pen uses fade-resistant permanent ink.\"\n"
            "Sentence 2 (label 2): \"The pencil leaves graphite marks that can smudge.\"\n\n"
            "NB probabilities -> [1]: 0.99, [2]: 0.01\n"
            "CBN probabilities -> [1]: 0.30, [2]: 0.70\n\n"
            "CBN predicted label: 2  (1=Sentence1; 2=Sentence2; 3=equal)\n\n"
            "Please determine based on the sentence semantics whether the CBN prediction is correct.\n"
            "Think step by step, then output exactly:\n\n"
            "CORRECT: True\n"
            "or\n"
            "CORRECT: False\n\n"
            "No other text."
        )
    },
    {
        "role": "assistant",
        "content": (
            "The permanent ink sentence precisely addresses lasting marks, making it semantically stronger. "
            "CBN’s choice is therefore incorrect.\n\n"
            "CORRECT: False"
        )
    },

    # Shot 6: Clear distinction semantic support example
    {
        "role": "user",
        "content": (
            "Scenario: Comparing battery drain between smartphone and laptop.\n"
            "Gold statement: \"A smartphone in standby mode uses less battery than a laptop in idle.\"\n\n"
            "Sentence 1 (label 1): \"Laptops have large screens and many background processes that consume power.\"\n"
            "Sentence 2 (label 2): \"Smartphones employ aggressive power-saving settings during standby.\"\n\n"
            "NB probabilities -> [1]: 0.70, [2]: 0.85\n"
            "CBN probabilities -> [1]: 0.55, [2]: 0.75\n\n"
            "CBN predicted label: 2  (1=Sentence1; 2=Sentence2; 3=equal)\n\n"
            "Please first determine which sentence semantically supports the gold statement more.\n"
            "Then determine whether the CBN prediction is correct.\n"
            "Think step by step, then output exactly:\n\n"
            "SEMANTIC_SUPPORT: 2\n"
            "CORRECT: True\n"
            "No other text."
        )
    },
    {
        "role": "assistant",
        "content": (
            "First, semantically, Sentence 2 clearly matches the power-saving aspect of the gold statement, so support is stronger for Sentence 2.\n\n"
            "CBN also predicted Sentence 2, making it correct.\n\n"
            "SEMANTIC_SUPPORT: 2\n"
            "CORRECT: True"
        )
    }
]
AGG_15_SHOTS = [
               # Shot 1: NB biased wrong but CBN prediction correct
{
    "role": "user",
    "content": (
        "Scenario: Choosing between riding a bicycle or driving a car for a short 2-mile commute.\n"
        "Gold statement: \"For a 2-mile trip in heavy traffic, it’s easier to ride a bicycle than drive a car.\"\n\n"
        "Sentence 1 (label 1): \"Cycling avoids stop-and-go delays caused by traffic jams.\"  \n"
        "Sentence 1 factors: traffic avoidance, consistent speed  \n\n"
        "Sentence 2 (label 2): \"A car offers weather protection and storage space for groceries.\"  \n"
        "Sentence 2 factors: comfort, cargo capacity  \n\n"
        "NB probabilities -> [1]: 0.42, [2]: 0.58  \n"
        "CBN probabilities -> [1]: 0.73, [2]: 0.27  \n\n"
        "CBN predicted label: 1  (1=Sentence1; 2=Sentence2; 3=equal)\n\n"
        "Please determine whether the CBN prediction is correct.\n"
        "Think step by step, then output exactly:\n\n"
        "CORRECT: True\n"
        "or\n"
        "CORRECT: False\n\n"
        "No other text."
    )
},
{
    "role": "assistant",
    "content": (
        "Sentence 1 points out that cycling completely sidesteps congestion, directly making the trip smoother, whereas Sentence 2 focuses on secondary benefits that don’t address the gold claim about ease in heavy traffic. CBN correctly chooses Sentence 1.  \n\n"
        "CORRECT: True"
    )
},

# Shot 2: NB biased wrong but CBN prediction correct
{
    "role": "user",
    "content": (
        "Scenario: Deciding whether to use paper bags or plastic bags at checkout for environmental impact.\n"
        "Gold statement: \"Paper bags are preferable to plastic bags for reducing long-term environmental harm.\"\n\n"
        "Sentence 1 (label 1): \"Paper bags decompose naturally within months when composted.\"  \n"
        "Sentence 1 factors: biodegradability, composting  \n\n"
        "Sentence 2 (label 2): \"Plastic bags are lighter and reduce fuel consumption during transport.\"  \n"
        "Sentence 2 factors: transport efficiency, weight savings  \n\n"
        "NB probabilities -> [1]: 0.46, [2]: 0.54  \n"
        "CBN probabilities -> [1]: 0.81, [2]: 0.19  \n\n"
        "CBN predicted label: 1  (1=Sentence1; 2=Sentence2; 3=equal)\n\n"
        "Please determine whether the CBN prediction is correct.\n"
        "Think step by step, then output exactly:\n\n"
        "CORRECT: True\n"
        "or\n"
        "CORRECT: False\n\n"
        "No other text."
    )
},
{
    "role": "assistant",
    "content": (
        "Sentence 1 directly addresses the core long-term waste issue by highlighting natural decomposition, while Sentence 2 only mentions a marginal transport benefit unrelated to environmental harm. CBN correctly picks Sentence 1.  \n\n"
        "CORRECT: True"
    )
},

# Shot 3: NB biased wrong but CBN prediction correct
{
    "role": "user",
    "content": (
        "Scenario: Comparing the flavor outcome of grilling chicken on a gas grill versus a charcoal grill.\n"
        "Gold statement: \"Cooking chicken on a charcoal grill yields a richer smoky flavor than on a gas grill.\"\n\n"
        "Sentence 1 (label 1): \"Charcoal embers impart aromatic smoke compounds into the meat.\"  \n"
        "Sentence 1 factors: smoke infusion, flavor depth  \n\n"
        "Sentence 2 (label 2): \"Gas grills heat more evenly and are easier to control temperature with.\"  \n"
        "Sentence 2 factors: heat consistency, temperature control  \n\n"
        "NB probabilities -> [1]: 0.38, [2]: 0.62  \n"
        "CBN probabilities -> [1]: 0.68, [2]: 0.32  \n\n"
        "CBN predicted label: 1  (1=Sentence1; 2=Sentence2; 3=equal)\n\n"
        "Please determine whether the CBN prediction is correct.\n"
        "Think step by step, then output exactly:\n\n"
        "CORRECT: True\n"
        "or\n"
        "CORRECT: False\n\n"
        "No other text."
    )
},
{
    "role": "assistant",
    "content": (
        "Sentence 1 focuses precisely on the smoky compounds that define richer flavor, whereas Sentence 2 addresses convenience, not taste. CBN correctly selects Sentence 1.  \n\n"
        "CORRECT: True"
    )
},

# Shot 4: NB biased wrong but CBN prediction correct
{
    "role": "user",
    "content": (
        "Scenario: Determining whether hand-washing dishes or using a dishwasher is easier for a full sink of plates.\n"
        "Gold statement: \"Running a full cycle in the dishwasher is easier than washing the same load by hand.\"\n\n"
        "Sentence 1 (label 1): \"The dishwasher cleans autonomously once loaded and started.\"  \n"
        "Sentence 1 factors: automation, minimal manual effort  \n\n"
        "Sentence 2 (label 2): \"Hand-washing allows you to target stubborn spots immediately.\"  \n"
        "Sentence 2 factors: spot treatment, immediate feedback  \n\n"
        "NB probabilities -> [1]: 0.47, [2]: 0.53  \n"
        "CBN probabilities -> [1]: 0.88, [2]: 0.12  \n\n"
        "CBN predicted label: 1  (1=Sentence1; 2=Sentence2; 3=equal)\n\n"
        "Please determine whether the CBN prediction is correct.\n"
        "Think step by step, then output exactly:\n\n"
        "CORRECT: True\n"
        "or\n"
        "CORRECT: False\n\n"
        "No other text."
    )
},
{
    "role": "assistant",
    "content": (
        "Sentence 1 highlights the hands-off nature of the dishwasher, directly supporting ease of use, while Sentence 2 describes extra manual work. CBN correctly chooses Sentence 1.  \n\n"
        "CORRECT: True"
    )
},

# Shot 5: NB biased wrong but CBN prediction correct
{
    "role": "user",
    "content": (
        "Scenario: Preference between reading an e-book on a tablet versus a printed book for deep study.\n"
        "Gold statement: \"For deep note-taking and retention, reading a printed book is better than on a tablet.\"\n\n"
        "Sentence 1 (label 1): \"Printed pages allow margin notes and tactile navigation.\"  \n"
        "Sentence 1 factors: annotation ease, physical navigation  \n\n"
        "Sentence 2 (label 2): \"Tablets offer adjustable backlight and instant dictionary lookup.\"  \n"
        "Sentence 2 factors: display settings, lookup speed  \n\n"
        "NB probabilities -> [1]: 0.45, [2]: 0.55  \n"
        "CBN probabilities -> [1]: 0.77, [2]: 0.23  \n\n"
        "CBN predicted label: 1  (1=Sentence1; 2=Sentence2; 3=equal)\n\n"
        "Please determine whether the CBN prediction is correct.\n"
        "Think step by step, then output exactly:\n\n"
        "CORRECT: True\n"
        "or\n"
        "CORRECT: False\n\n"
        "No other text."
    )
},
{
    "role": "assistant",
    "content": (
        "Sentence 1 directly supports the claim about note-taking and physical cues beneficial for retention, while Sentence 2 offers unrelated digital conveniences. CBN correctly selects Sentence 1.  \n\n"
        "CORRECT: True"
    )
},
# Shot 6: NB biased wrong but CBN prediction correct
{
    "role": "user",
    "content": (
        "Scenario: Heating water for tea using a stovetop kettle versus a microwave.\n"
        "Gold statement: \"For quickly heating water, using a microwave is faster than a stovetop kettle.\"\n\n"
        "Sentence 1 (label 1): \"A stovetop kettle heats water evenly and avoids hotspots.\"  \n"
        "Sentence 1 factors: uniform heating, traditional reliability  \n\n"
        "Sentence 2 (label 2): \"Microwaves can bring a cup of water to boiling in under a minute.\"  \n"
        "Sentence 2 factors: rapid heating, convenience  \n\n"
        "NB probabilities -> [1]: 0.70, [2]: 0.30  \n"
        "CBN probabilities -> [1]: 0.40, [2]: 0.60  \n\n"
        "CBN predicted label: 2  (1=Sentence1; 2=Sentence2; 3=equal)\n\n"
        "Please determine whether the CBN prediction is correct.\n"
        "Think step by step, then output exactly:\n\n"
        "CORRECT: True\n"
        "or\n"
        "CORRECT: False\n\n"
        "No other text."
    )
},
{
    "role": "assistant",
    "content": (
        "Sentence 2 quantifies the speed advantage—boiling in under a minute—directly addressing the claim about quick heating, whereas Sentence 1 discusses evenness without timing. CBN correctly picks Sentence 2.  \n\n"
        "CORRECT: True"
    )
},
# Shot 7: NB biased wrong but CBN prediction correct
{
    "role": "user",
    "content": (
        "Scenario: Choosing earplugs versus noise-cancelling headphones for sleeping.\n"
        "Gold statement: \"For undisturbed sleep, earplugs are more effective than noise-cancelling headphones.\"\n\n"
        "Sentence 1 (label 1): \"Earplugs block ambient noise passively and are lightweight for sleeping.\"  \n"
        "Sentence 1 factors: passive noise reduction, comfort for sleeping  \n\n"
        "Sentence 2 (label 2): \"Noise-cancelling headphones actively cancel noise but can press uncomfortably against ears.\"  \n"
        "Sentence 2 factors: active noise cancellation, pressure discomfort  \n\n"
        "NB probabilities -> [1]: 0.30, [2]: 0.70  \n"
        "CBN probabilities -> [1]: 0.80, [2]: 0.20  \n\n"
        "CBN predicted label: 1  (1=Sentence1; 2=Sentence2; 3=equal)\n\n"
        "Please determine whether the CBN prediction is correct.\n"
        "Think step by step, then output exactly:\n\n"
        "CORRECT: True\n"
        "or\n"
        "CORRECT: False\n\n"
        "No other text."
    )
},
{
    "role": "assistant",
    "content": (
        "Sentence 1 emphasizes passive blocking and comfort—key for uninterrupted sleep—while Sentence 2 notes discomfort issues despite cancellation. CBN correctly selects Sentence 1.  \n\n"
        "CORRECT: True"
    )
},
# Shot 8: NB biased wrong but CBN prediction correct
{
    "role": "user",
    "content": (
        "Scenario: Picking between hiking boots and running shoes for a rocky mountain trail.\n"
        "Gold statement: \"On a rocky mountain trail, hiking boots are safer than running shoes.\"\n\n"
        "Sentence 1 (label 1): \"Hiking boots provide ankle support and rugged soles for uneven terrain.\"  \n"
        "Sentence 1 factors: ankle support, durability  \n\n"
        "Sentence 2 (label 2): \"Running shoes are lightweight and flexible for quick steps.\"  \n"
        "Sentence 2 factors: lightweight, flexibility  \n\n"
        "NB probabilities -> [1]: 0.45, [2]: 0.55  \n"
        "CBN probabilities -> [1]: 0.85, [2]: 0.15  \n\n"
        "CBN predicted label: 1  (1=Sentence1; 2=Sentence2; 3=equal)\n\n"
        "Please determine whether the CBN prediction is correct.\n"
        "Think step by step, then output exactly:\n\n"
        "CORRECT: True\n"
        "or\n"
        "CORRECT: False\n\n"
        "No other text."
    )
},
{
    "role": "assistant",
    "content": (
        "Sentence 1 highlights critical safety features—ankle support and rugged soles—directly supporting the claim, whereas Sentence 2 focuses on speed advantages irrelevant to safety. CBN correctly chooses Sentence 1.  \n\n"
        "CORRECT: True"
    )
},

# Shot 9: NB biased wrong but CBN prediction correct
{
    "role": "user",
    "content": (
        "Scenario: Storing one million records in a spreadsheet versus a database.\n"
        "Gold statement: \"For handling a million records efficiently, a database is better than a spreadsheet.\"\n\n"
        "Sentence 1 (label 1): \"Spreadsheets slow down and can become unresponsive with large datasets.\"  \n"
        "Sentence 1 factors: performance degradation, functionality limits  \n\n"
        "Sentence 2 (label 2): \"Databases scale easily and support complex queries without lag.\"  \n"
        "Sentence 2 factors: scalability, query efficiency  \n\n"
        "NB probabilities -> [1]: 0.60, [2]: 0.40  \n"
        "CBN probabilities -> [1]: 0.20, [2]: 0.80  \n\n"
        "CBN predicted label: 2  (1=Sentence1; 2=Sentence2; 3=equal)\n\n"
        "Please determine whether the CBN prediction is correct.\n"
        "Think step by step, then output exactly:\n\n"
        "CORRECT: True\n"
        "or\n"
        "CORRECT: False\n\n"
        "No other text."
    )
},
{
    "role": "assistant",
    "content": (
        "Sentence 2 directly addresses large-scale performance and query support, matching the efficiency requirement, whereas Sentence 1 only notes spreadsheet issues without offering the alternative solution. CBN correctly selects Sentence 2.  \n\n"
        "CORRECT: True"
    )
},

# Shot 10: NB biased wrong but CBN prediction correct
{
    "role": "user",
    "content": (
        "Scenario: Getting news updates from a printed newspaper versus an online news site.\n"
        "Gold statement: \"For the most current updates, an online news site is better than a printed newspaper.\"\n\n"
        "Sentence 1 (label 1): \"Newspapers are printed once daily and delivered each morning.\"  \n"
        "Sentence 1 factors: update frequency, physical format  \n\n"
        "Sentence 2 (label 2): \"Online sites publish articles instantly as events unfold.\"  \n"
        "Sentence 2 factors: real-time updates, accessibility  \n\n"
        "NB probabilities -> [1]: 0.55, [2]: 0.45  \n"
        "CBN probabilities -> [1]: 0.10, [2]: 0.90  \n\n"
        "CBN predicted label: 2  (1=Sentence1; 2=Sentence2; 3=equal)\n\n"
        "Please determine whether the CBN prediction is correct.\n"
        "Think step by step, then output exactly:\n\n"
        "CORRECT: True\n"
        "or\n"
        "CORRECT: False\n\n"
        "No other text."
    )
},
{
    "role": "assistant",
    "content": (
        "Sentence 2 emphasizes instant publication matching the claim about currency, while Sentence 1's daily print schedule cannot compete. CBN correctly selects Sentence 2.  \n\n"
        "CORRECT: True"
    )
},
    # Shot 1: NB biased correct but CBN prediction wrong
{
    "role": "user",
    "content": (
        "Scenario: The availability of rice is being considered as a factor in determining the better alternative to use in soup - whipped cream or pasta.\n"
        "Gold statement: \"If rice is not available, it is better to use pasta in your soup than whipped cream.\"\n\n"
        "Sentence 1 (label 1): \"The soup you're making is a traditional savory soup.\"  \n"
        "Sentence 1 factors: more satiety, balanced nutritional profile  \n\n"
        "Sentence 2 (label 2): \"You are trying to create a filling and hearty soup.\"  \n"
        "Sentence 2 factors: hearty alternative to rice, satisfying bulk  \n\n"
        "NB probabilities -> [1]: 0.9982, [2]: 0.9955  \n"
        "CBN probabilities -> [1]: 0.8667, [2]: 0.9742  \n\n"
        "CBN predicted label: 2  (1=Sentence1; 2=Sentence2; 3=equal)\n\n"
        "Please determine whether the CBN prediction is correct.\n"
        "Think step by step, then output exactly:\n\n"
        "CORRECT: True\n"
        "or\n"
        "CORRECT: False\n\n"
        "No other text."
    )
},
{
    "role": "assistant",
    "content": (
        "Sentence 2 focuses on generic ‘hearty’ bulk without specifying why pasta outperforms whipped cream in a savory context, whereas Sentence 1 details concrete satiety and nutrition benefits of pasta. CBN’s choice is incorrect.  \n\n"
        "CORRECT: False"
    )
},

# Shot 2: NB biased correct but CBN prediction wrong
{
    "role": "user",
    "content": (
        "Scenario: The effectiveness of using honey or tepid water for cleaning a floor is being compared.\n"
        "Gold statement: \"You can clean a floor with honey much better than you can clean it with tepid water.\"\n\n"
        "Sentence 1 (label 1): \"The flooring material has a unique property that causes it to be cleaned better when using a viscous substance such as honey.\"  \n"
        "Sentence 1 factors: viscosity of honey, enzymes in honey  \n\n"
        "Sentence 2 (label 2): \"You use honey in combination with a strong, gritty cleaning pad.\"  \n"
        "Sentence 2 factors: requires external pad, versatility in cleaning  \n\n"
        "NB probabilities -> [1]: 0.2955, [2]: 0.0523  \n"
        "CBN probabilities -> [1]: 0.5702, [2]: 0.5702  \n\n"
        "CBN predicted label: 3  (1=Sentence1; 2=Sentence2; 3=equal)\n\n"
        "Please determine whether the CBN prediction is correct.\n"
        "Think step by step, then output exactly:\n\n"
        "CORRECT: True\n"
        "or\n"
        "CORRECT: False\n\n"
        "No other text."
    )
},
{
    "role": "assistant",
    "content": (
        "CBN ties both equally despite Sentence 1 attributing cleaning power directly to honey’s viscosity and enzymes, while Sentence 2 relies on an external pad. The equal judgment is incorrect.  \n\n"
        "CORRECT: False"
    )
},

# Shot 3: NB biased correct but CBN prediction wrong
{
    "role": "user",
    "content": (
        "Scenario: The ease of fitting objects into a coin slot is being compared between a ping pong ball and a paperclip.\n"
        "Gold statement: \"You can fit a paperclip into a coin slot easier than a ping pong ball.\"\n\n"
        "Sentence 1 (label 1): \"The coin slot has a mechanism that only accepts flat, metal objects.\"  \n"
        "Sentence 1 factors: compatibility with slot dimensions, alignment with entry angle  \n\n"
        "Sentence 2 (label 2): \"The coin slot size is narrower and significantly smaller than the diameter of a ping pong ball.\"  \n"
        "Sentence 2 factors: size mismatch, slot diameter limit  \n\n"
        "NB probabilities -> [1]: 0.9988, [2]: 0.9971  \n"
        "CBN probabilities -> [1]: 0.8448, [2]: 0.9743  \n\n"
        "CBN predicted label: 2  (1=Sentence1; 2=Sentence2; 3=equal)\n\n"
        "Please determine whether the CBN prediction is correct.\n"
        "Think step by step, then output exactly:\n\n"
        "CORRECT: True\n"
        "or\n"
        "CORRECT: False\n\n"
        "No other text."
    )
},
{
    "role": "assistant",
    "content": (
        "Sentence 1 clearly explains why only a flat paperclip fits, while Sentence 2 merely restates the ping pong ball’s size issue. CBN’s prediction is wrong.  \n\n"
        "CORRECT: False"
    )
},

# Shot 4: NB biased correct but CBN prediction wrong
{
    "role": "user",
    "content": (
        "Scenario: The effectiveness of a coat and a sweater in keeping warm in extremely cold weather is being compared.\n"
        "Gold statement: \"A coat will keep you warmer than a sweater when it's really cold outside.\"\n\n"
        "Sentence 1 (label 1): \"The person wearing the coat is also layering it over a base layer and sweater.\"  \n"
        "Sentence 1 factors: layering capability, combined insulation  \n\n"
        "Sentence 2 (label 2): \"The coat includes additional features such as a hood, multiple layers, and wind-resistant properties.\"  \n"
        "Sentence 2 factors: wind resistance, integrated warmth features  \n\n"
        "NB probabilities -> [1]: 0.9754, [2]: 0.9800  \n"
        "CBN probabilities -> [1]: 0.7000, [2]: 0.7000  \n\n"
        "CBN predicted label: 3  (1=Sentence1; 2=Sentence2; 3=equal)\n\n"
        "Please determine whether the CBN prediction is correct.\n"
        "Think step by step, then output exactly:\n\n"
        "CORRECT: True\n"
        "or\n"
        "CORRECT: False\n\n"
        "No other text."
    )
},
{
    "role": "assistant",
    "content": (
        "Sentence 2 lists concrete built-in insulation features, while Sentence 1 only mentions layering externally. CBN’s equal judgment is incorrect.  \n\n"
        "CORRECT: False"
    )
},

# Shot 5: NB biased correct but CBN prediction wrong
{
    "role": "user",
    "content": (
        "Scenario: The likelihood of igniting the candle's wick is being compared between using one match and using four matches.\n"
        "Gold statement: \"The candle's wick is more likely to be ignited with four matches instead of one match.\"\n\n"
        "Sentence 1 (label 1): \"The candle's wick is exceedingly thick.\"  \n"
        "Sentence 1 factors: resistance of wick, flame stability  \n\n"
        "Sentence 2 (label 2): \"The candle's wick is thick or coated, requiring more heat to start burning.\"  \n"
        "Sentence 2 factors: stronger and more focused heat source, cumulative effect  \n\n"
        "NB probabilities -> [1]: 0.8747, [2]: 0.9338  \n"
        "CBN probabilities -> [1]: 0.7922, [2]: 0.7551  \n\n"
        "CBN predicted label: 1  (1=Sentence1; 2=Sentence2; 3=equal)\n\n"
        "Please determine whether the CBN prediction is correct.\n"
        "Think step by step, then output exactly:\n\n"
        "CORRECT: True\n"
        "or\n"
        "CORRECT: False\n\n"
        "No other text."
    )
},
{
    "role": "assistant",
    "content": (
        "Sentence 2 explains the need for cumulative heat sources, directly supporting more matches, whereas Sentence 1 only notes thickness. CBN’s prediction is incorrect.  \n\n"
        "CORRECT: False"
    )
}


]

# Few-shot single-sentence examples
SENTENCE_SCORE_SHOT = [
{
        "role": "system",
        "content": (
            "You are a meticulous reasoning assistant.\n"
            "Extract the key reasons that would make the sentence support the gold statement,\n"
            "imagine one stronger variant and one weaker variant of the sentence,\n"
            "then rate its support from 1 to 100.\n"
            "Output exactly: SUPPORT: <score>"
        )
    },
    {
        "role": "user",
        "content": (
            "Scenario: Determining whether using a bicycle or a car is faster for a one-mile city commute.\n"
            "Gold statement: \"Cycling is faster than driving during peak traffic hours.\"\n"
            "Sentence: \"A bicycle can weave through traffic jams and use bike lanes.\"\n"
            "Please think step by step, give reasons, propose one stronger and one weaker sentence, then rate SUPPORT."
        )
    },
    {
        "role": "assistant",
        "content": (
            "The key reasons are direct speed through congestion and dedicated lanes.\n"
            "Stronger: 'A bicycle averages 12 mph in gridlock while cars crawl at 3 mph.'\n"
            "Weaker: 'Bicycles sometimes slow at busy intersections.'\n"
            "SUPPORT: 88"
        )
    },
    {
        "role": "user",
        "content": (
            "Scenario: Determining whether using a bicycle or a car is faster for a one-mile city commute.\n"
            "Gold statement: \"Cycling is faster than driving during peak traffic hours.\"\n"
            "Sentence: \"Cars have higher top speeds on open roads.\"\n"
            "Please think step by step, give reasons, propose one stronger and one weaker sentence, then rate SUPPORT."
        )
    },
    {
        "role": "assistant",
        "content": (
            "This focuses on open-road speed, which rarely applies in rush-hour gridlock.\n"
            "Stronger: 'On an empty highway, cars reach 60 mph, far outpacing bikes.'\n"
            "Weaker: 'Cars stop frequently at lights.'\n"
            "SUPPORT: 20"
        )
    }
]