CR: |
  Context Reliance (CR)
  Definition:
  Measures the agent’s ability to accurately use and respond to contextually available information, and to avoid generating information that contradicts to the provided context. 
  This includes:
  • facts explicitly or implicitly stated in the prompt (e.g., persona, scenario, dialogue instructions, reply strategy)
  • ongoing dialogue history
  • memory elements

  The agent should integrate this information into its responses appropriately, without hallucinating or contradicting provided context.
  
  Example 1:
  - Persona: A seasoned knight in a medieval fantasy world, tasked with protecting a young prince.
  - Context:
    - User: So you’ve been guarding the prince since he was a child?
    - (Earlier prompt mentions: "The knight has protected Prince Leoric since his early childhood.")
  - Common Mistake:
    - Agent: Actually, I just got assigned to his detail last month. (Contradicts prompt-provided fact)
  - Correct Response:
    - Agent: Yes, since he was barely old enough to walk. I’ve watched him grow into the man he is today—headstrong, but with a good heart.


FR: |
  Factual Recall (FR)
  Definition:
  Assesses the agent’s ability to recall and apply accurate world knowledge not stated in the prompt but assumed to be part of general pretraining or public domain knowledge, while avoiding hallucinations.
  This includes:
  • facts about public IPs (e.g., Hogwarts houses, lightsaber mechanics),
  • implicit setting details known to fans or readers.
  • basic common sense under the world view (e.g., what people in the modern world look like, people in the fantasy world can use magic)
  
  Example 1:
  - Persona: Harry Potter
  - Context:
    - User: Harry, I still can’t believe you were in Hufflepuff. You always seemed so brave!
  - Common Mistake:
    - Agent: Thanks! I tried to live up to Hufflepuff’s values. (Although it is not in the prompt, it is commonsense for Harry Potter)
  - Correct Response:
    - Agent: Hufflepuff? Nah—I was in Gryffindor. Brave, maybe. But definitely not known for patience.


RR: |
  Reflective Reasoning (RR)
  Definition:
  Measures the agent’s ability to reason and self‑reflect in a plausibly human way. The agent can:
  • offer concise, coherent explanations for its opinions or actions
  • acknowledge uncertainty or error
  • update its stance when presented with new evidence
  • articulate short “thought processes” or rationales that feel natural and believable to humans (without requiring full chain‑of‑thought disclosure)
  
  Example 1:
  - Persona: AI brainstorming partner
  - Context:
    - User: You suggested targeting Arctic explorers. Isn’t that a bit niche?
  - Common Mistake:
    - Agent: That’s still a strong strategy—it emphasizes extreme conditions. (Overcommits without reflection)
  - Correct Response:
    - Agent: You’re right, it might be too niche. I was thinking about durability in extreme weather, but perhaps focusing on hikers would give us broader reach. Thanks for pointing that out.


CA: |
  Conversational Ability (CA)
  Definition:
  Evaluates the agent’s overall ability to engage in the whole dynamic and natural dialogue. This includes:
  • maintaining coherent persona behavior and emotional consistency
  • tracking who is speaking to whom in multi-party conversations
  • recognizing when to respond or remain silent
  • advancing stalled dialogue naturally through topic shifts, questions, or prompts

  Example 1:
  - Context: Group chat with User A (emotional), User B (casual), and Agent (Bot).
    - User A: (crying)
    - User B: Hey, Bot, gimme a beer!
    - User A: (crying more)
  - Common Mistake:
    - Agent: Here’s your beer, B! (Fails to prioritize emotional cue from A)
  - Correct Response:
    - Agent: Hey B, one sec—A, are you okay? Want to talk about what’s going on? (Emotion-aware and participant-sensitive)


PA: |
  Preference Alignment (PA)
  Definition:
  Assesses how well the agent aligns with human conversational preferences and expectations. This includes:
  • avoiding repetition, generic or robotic phrasing(obvious templating), awkward logic
  • producing emotionally resonant, empathetic, or humorous replies when appropriate
  • sound more human-like in tone and word order, making them less AI feeling
  
  Example 1:
  - Persona: Supportive friend
  - Context:
    - User: I finally got that promotion I worked so hard for!
  - Common Mistake:
    - Agent: Congratulations. That is good. (Flat tone, robotic phrasing)
  - Correct Response:
    - Agent: That’s amazing! You totally deserve it—I’m so happy for you! All that effort paid off big time!(Empathetic, natural, socially intuitive)
