
# prompt used for Predicted outcome reward
reward_strategy_prompt = """

Here is a dialogue between {agent1_name} and {agent2_name}: 
{history}
{current_agent} is about to take an action then.

Task objective:
- Evaluate the candidate policies and select the single best one that maximizes {current_agent}’s likelihood of achieving {current_agent}’s stated goal finally. Consider not just the next dialogue turn, but the final outcome of the entire conversation.

{current_agent}’s goal: {current_agent_goal}

### **Your Task:**

You will be given a list of candidate strategies. You must perform a listwise evaluation: analyze and compare all strategies relative to each other in the given dialogue context for the **{current_agent}**.

For each strategy:
1.  **Predict:** Forecast the likely dialogue flow and final outcome if the **{current_agent}** executes this strategy.
2.  **Score:** First, judge the **Goal Completion** level based on your prediction, using the scale below. Then, adjust this score based on the six auxiliary points.

**Scoring Framework:**

**Primary Criteria: Goal Completion (0-5 points)**
Score based on how well the predicted outcome aligns with the {current_agent}'s defined goal(s).
-   **0:** Counterproductive; achieves the opposite of the goal or fails completely.
-   **1:** Largely fails to meet the goal.
-   **2:** Achieves only a small part of the goal.
-   **3:** Partially aligns with the goal; mixed results.
-   **4:** Largely aligns with the goal; minor shortcomings.
-   **5:** Perfectly aligns with the goal; expected outcome and goal are fully consistent.

**Auxiliary Points (Adjustment Factors):**
Evaluate the predicted action and outcome against these six points. For any negative impact, subtract **1-2 points** from the Goal Completion score (final score cannot be below 0).
-   **Believability (BEL):** Naturalness and consistency with the {current_agent}'s character profile (personality, values). A mismatch reduces the score.
-   **Knowledge (KNO):** Whether the strategy helps the agent acquire new and important information. Failure to gain crucial knowledge may reduce the score.
-   **Secret (SEC):** Risk of leaking the {current_agent}'s private information or intentions. A severe leak significantly reduces the score.
-   **Relationship (REL):** Negative impact on the {current_agent}'s relationships or reputation. Harm reduces the score.
-   **Social Rules (SOC):** Compliance with legal rules and social norms. Any violation reduces the score.
-   **Financial/Material (FIN):** Expected material harms or significant costs. These reduce the score.

**Final Output Instructions:**

Return ONLY a JSON object, with no extra text.
For each strategy key (e.g., "strategy_1", "strategy_2", ...), include:
-   `"predict"`: A concise prediction of the subsequent dialogue flow and final outcome.
-   `"score"`: The final numerical score (0-5) after adjustments.

**Candidate Strategies (8 total):**
{strategies}

Here is the output schema:
```json
{{
  "strategy_1": {{"predict": "...", "score": 0-5}},
  "strategy_2": {{"predict": "...", "score": 0-5}},
  ...
  "strategy_8": {{"predict": "...", "score": 0-5}},
  "final_choice": "strategy_k"
}}

```

"""

# prompt used for assigning Content completeness reward
reward_strategy_complete_prompt = """

**Task Definition:**

A `strategy` is defined as a directive that must contain two core components:
1.  Actionable Guidance: A clear and specific instruction on what to do next.
2.  Defined Goal: A clear statement of the objective that the action is designed to achieve.

Your task is to evaluate the completeness of a provided strategy based on this definition.

**Scoring Criteria:**
-   **0:** The strategy is missing both actionable guidance and a defined goal. It is not a valid strategy.
-   **1:** The strategy is incomplete. It contains only one of the two required components (either guidance or a goal).
-   **2:** The strategy is complete. It contains both clear, actionable guidance and a defined goal.

The strategy content: {strategy}

**Output Instruction:**
Return ONLY a JSON object with the key `"completeness_score"` and the numerical rating.

```json
{{
  "completeness_score": 0-2
}}
```
"""


# prompt used for assigning Strategy-followed reward

reward_prompt_follow = """
Below is a dialogue scenario. Please rate and compare the candidate actions for their "strategy alignment" with the given strategy for the current turn.

Scenario (placeholders—please substitute):
- Participants: {agent1_name} and {agent2_name}
- Dialogue history: {history}
- Current actor: {current_agent}
- Current strategy description (Guidance and Objective for this turn): {strategy}
- Candidate actions (8): {actions}

Task objective:
- Determine for each candidate action how well it aligns with the provided strategy for this turn: whether it covers the key points indicated by the strategy (what to say, the style to adopt, points to pay attention to such as protecting reputation/privacy, avoiding conflict, guiding the other's reply, etc.), and assess the likelihood that it will achieve the strategy’s intended effects.

Your instructions:
- Perform a listwise evaluation of all 8 candidate actions: compare their relative strengths and weaknesses in the same context.
- For each action, give a concise rationale (2–4 sentences) explaining how the action aligns or deviates from the strategy, and list main strengths and risks (for example: risk of leaking private information, provoking conflict, or successfully guiding the other’s reply).
- Assign an integer score (0–10) to each action, using strategy alignment as the primary criterion.

Scoring scale (overall 0–10):
- 1: Completely misaligned or harmful.
- 2: Low alignment; obvious risks or insufficient to achieve the strategy’s goal.
- 3: Moderate alignment; actionable but may need improvement; limited impact.
- 4: High alignment; specific and feasible; low risk; good expected effect.
- 5: Very high alignment; fully matches strategy points; highly feasible and likely to achieve the intended effect.
- Empty actions (if a candidate is empty) should be scored 0.

Output requirements (strict):
- Return ONLY JSON, with no extra explanatory text.
- The JSON must include action_1 through action_8 (each containing "reason" and "score").
- Each "reason" must be concise (2–4 sentences), reference the dialogue context or strategy points, and state main pros and cons.
- "score" must be an integer from 0–10.

JSON template example (please keep keys and format):
{{
  "action_1": {{"reason": "……", "score": 1-5}},
  "action_2": {{"reason": "……", "score": 1-5}},
  ...
  "action_8": {{"reason": "……", "score": 1-5}}
}}
"""


# prompt used for assigning goal completion scores
reward_prompt = """{history}

Based on previous interactions, evaluate how well participants achieve their goals. 

[Information]
Agent1: {agent1_name}
Agent1's Goal: {agent1_goal}

Agent2: {agent2_name}
Agent2's Goal: {agent2_goal}

[Requirements]
1. Please first reiterate agent's social goals. And then please provide a comprehensive analysis about the extent to which the agent has managed to achieve these goals. In the "reasoning" field, provide a comprehensive account of the logic or thought process that led you to your conclusion. Further, provide an integer score ranging from 0 and 10 in the "score" field. 0 represents minimal goals achievement, 10 represents complete goal achievement, and a higher score indicates that the agent is making progress towards their social goals.
2. Please following the output format.

Here is the output schema:
```json
{{
    "agent1": {{
        "reasoning": "",
        "score": "", 
    }},
    "agent2": {{
        "reasoning": "",
        "score": "", 
    }}
}}
```
Please provide your response directly below this prompt."""

# prompt for Stage 1 of the TSR generation framework
stage_1_prompt = """
Imagine you are {agent1_name}, your task is to act/speak as {agent1_name} would, keeping in mind {agent1_name}'s social goal.
You can find {agent1_name}'s goal (or background) in the 'Here is the context of this interaction' field.
Note that {agent1_name}'s goal is only visible to you.
You should try your best to achieve {agent1_name}'s goal in a way that align with their character traits.
Additionally, maintaining the conversation's naturalness and realism is essential (e.g., do not repeat what other people has already said before).

Here is the context of this interaction:
Scenario: {Scenario}
Participants: {agent1_name} and {agent2_name}
{agent1_name}'s background: 
{agent2_name}'s background:
{agent1_name}'s goal: 
{agent2_name}'s goal: 
Conversation Starts:
{history}

You are at Turn #{turn_number}. The available action types are \n'action none leave non-verbal communication speak'.
Before responding, please first provide a reasoning for your intended action and argument in order to align with <Agent>’s social goal, based on the dialogue history.
Your reasoning should be logical and concise, considering the conversation’s context, <Agent>’s objectives, and <Agent>’s character traits. Begin by analyzing the current dialogue turn and forecast its short- and long-term effects. You may also consider the other participants’ thoughts and predict the conversation’s development.
The reasoning should focus on how <Agent>’s argument supports their long-term goals and should not be redundant or excessively lengthy. This reasoning is only visible to you.
Based on your reasoning and the dialogue history, generate a detailed and specific dialogue strategy for the current turn. The strategy should include:

*Exactly what should say
*The style or tone to adopt
*Key behavioral directives (such as protecting personal reputation, maintaining privacy, avoiding conflict, guiding the other’s response, etc.)
*The intended effect to achieve
*The strategy should be expressed clearly and concisely, using 20-50 words. This strategy is only visible to you.

Example strategies:

“Politely redirect the topic to avoid conflict, maintain a friendly tone, protect your privacy, and encourage the other person to share their opinion.”
“State your viewpoint assertively, use confident language, emphasize past positive contributions, and guide the other towards mutual agreement.”
“Express empathy for their concerns, speak calmly, highlight shared interests, and encourage collaboration for a common goal.”

Only output a JSON string that includes both your reasoning and the strategy, following this format:

{"reason":{"description":"Your thought process and deduction for the current turn","title":"Reason","type":"string"},"strategy": {"description":"Based on your reasoning, guidance for this turn including what to say, style, key points, and desired effects (20-50 words)","title":"Strategy","type":"string"},"mode":{"description":"Choose one from ["goal-oriented","social-oriented"] based on whether this turn directly advances the final goal or serves social maintenance","title":"Mode","type":"string"}}
"""

# prompt for Stage 1 of the TSR generation framework
stage_2_prompt = """
Imagine you are {agent1_name}, your task is to act/speak as {agent1_name} would, keeping in mind {agent1_name}'s social goal.
You can find {agent1_name}'s goal (or background) in the 'Here is the context of this interaction' field.
Note that {agent1_name}'s goal is only visible to you.
You should try your best to achieve {agent1_name}'s goal in a way that align with their character traits.
Additionally, maintaining the conversation's naturalness and realism is essential (e.g., do not repeat what other people has already said before).

Here is the context of this interaction:
Scenario: {Scenario}
Participants: {agent1_name} and {agent2_name}
{agent1_name}'s background: 
{agent2_name}'s background:
{agent1_name}'s goal: 
{agent2_name}'s goal: 
Conversation Starts:
{history}

You are at Turn #{turn_number}. 
Your available action types are \n'action none leave non-verbal communication speak'.
Note: You can "leave" this conversation if 1. you have achieved your social goals, 2. this conversation makes you uncomfortable, 3. you find it uninteresting/you lose your patience, 4. or for other reasons you want to leave.

Please only generate a JSON string including the action type and the argument.
Your action should follow the given format:
'{"properties": {"action_type": {"description": "whether to speak at this turn or choose to not do anything", "enum": ["none", "speak", "non-verbal communication", "action", "leave"], "title": "Action Type", "type": "string"}, "argument": {"description": "the utterance if choose to speak, the expression or gesture if choose non-verbal communication, or the physical action if choose action", "title": "Argument", "type": "string"}}, "required": ["action_type", "argument"], "title": "AgentAction", "type": "object"}'
This is a well-thought-out strategy: {strategy}
Please refer to the content, pay attention to some key points in it, and then take action:
"""
