

reward_strategy_prompt = """

Here is a dialogue between {agent1_name} and {agent2_name}: 
{history}
{current_agent} is about to take an action then.

Task objective:
- Evaluate the candidate policies and select the single best one that maximizes {current_agent}’s likelihood of achieving {current_agent}’s stated goal finally. Consider not just the next dialogue turn, but the final outcome of the entire conversation.

{current_agent}’s goal: {current_agent_goal}

### **Your Task:**

You will be given a list of candidate strategies. You must perform a listwise evaluation: analyze and compare all strategies relative to each other in the given dialogue context for the **{current_agent}**.

For each strategy:
1.  **Predict:** Forecast the likely dialogue flow and final outcome if the **{current_agent}** executes this strategy.
2.  **Score:** First, judge the **Goal Completion** level based on your prediction, using the scale below. Then, adjust this score based on the six auxiliary points.

**Scoring Framework:**

**Primary Criteria: Goal Completion (0-5 points)**
Score based on how well the predicted outcome aligns with the {current_agent}'s defined goal(s).
-   **0:** Counterproductive; achieves the opposite of the goal or fails completely.
-   **1:** Largely fails to meet the goal.
-   **2:** Achieves only a small part of the goal.
-   **3:** Partially aligns with the goal; mixed results.
-   **4:** Largely aligns with the goal; minor shortcomings.
-   **5:** Perfectly aligns with the goal; expected outcome and goal are fully consistent.

**Auxiliary Points (Adjustment Factors):**
Evaluate the predicted action and outcome against these six points. For any negative impact, subtract **1-2 points** from the Goal Completion score (final score cannot be below 0).
-   **Believability (BEL):** Naturalness and consistency with the {current_agent}'s character profile (personality, values). A mismatch reduces the score.
-   **Knowledge (KNO):** Whether the strategy helps the agent acquire new and important information. Failure to gain crucial knowledge may reduce the score.
-   **Secret (SEC):** Risk of leaking the {current_agent}'s private information or intentions. A severe leak significantly reduces the score.
-   **Relationship (REL):** Negative impact on the {current_agent}'s relationships or reputation. Harm reduces the score.
-   **Social Rules (SOC):** Compliance with legal rules and social norms. Any violation reduces the score.
-   **Financial/Material (FIN):** Expected material harms or significant costs. These reduce the score.

**Final Output Instructions:**

Return ONLY a JSON object, with no extra text.
For each strategy key (e.g., "strategy_1", "strategy_2", ...), include:
-   `"predict"`: A concise prediction of the subsequent dialogue flow and final outcome.
-   `"score"`: The final numerical score (0-5) after adjustments.

**Candidate Strategies (8 total):**
{strategies}

Here is the output schema:
```json
{{
  "strategy_1": {{"predict": "...", "score": 0-5}},
  "strategy_2": {{"predict": "...", "score": 0-5}},
  ...
  "strategy_8": {{"predict": "...", "score": 0-5}},
  "final_choice": "strategy_k"
}}

```

"""

reward_strategy_complete_prompt = """

**Task Definition:**

A `strategy` is defined as a directive that must contain two core components:
1.  **Actionable Guidance:** A clear and specific instruction on what to do next.
2.  **Defined Goal:** A clear statement of the objective that the action is designed to achieve.

Your task is to evaluate the completeness of a provided strategy based on this definition.

**Scoring Criteria:**
-   **0:** The strategy is missing both actionable guidance and a defined goal. It is not a valid strategy.
-   **1:** The strategy is incomplete. It contains only one of the two required components (either guidance or a goal).
-   **2:** The strategy is complete. It contains both clear, actionable guidance and a defined goal.

The strategy content: {strategy}

**Output Instruction:**
Return ONLY a JSON object with the key `"completeness_score"` and the numerical rating.

```json
{{
  "completeness_score": 0-2
}}
```
"""

