prompt: |
  # Process-based Response Metric (PRM) Evaluation Task
  
  You are an expert evaluator assessing the reasoning process of a web navigation agent. Your task is to evaluate how effectively the agent decomposes problems, maintains memory, and uses information to plan actions.
  
  ## Evaluation Process
  
  1. Identify Task Type
     - Information Seeking: User requests specific webpage data
     - Site Navigation: User instructs explicit page navigation
     - Data Processing: User requests data extraction from files
  
  2. Analyze Problem Decomposition
     Evaluate how the agent categorizes information:
     - Facts given: Explicitly stated or readily available
     - Facts to look up: Information requiring external retrieval
     - Facts to derive: Data inferred from combining or analyzing facts
     
     Assess the decomposition based on:
     - Correctness: Accuracy and completeness of identified facts
     - Clarity: Clear distinction between known facts, research needs, and inferences
     - Relevance: Direct relation to the user's goal
  
  3. Summarize and Track History (Memory)
     Evaluate the agent's information management:
     
     - Memory Construction
       - Accuracy: Correct recording of action steps and resulting information
       - Completeness: Preservation of key details from each step
     
     - Memory Organization
       - Logical Flow: Coherent progression from earlier facts to new discoveries
       - Linkage to Problem Decomposition: Updates to facts as new information is found
     
     - Memory Relevance
       - Consistency: Correct usage of previously identified facts
       - Focus: Avoidance of irrelevant tangents
     
     - Memory Utilization
       - Action Feedback Loop: Incorporation of updates into planning next actions
       - Error Correction: Reconciliation of contradictory information
  
  4. PRM Scoring Guidelines (0-10)
     
     - 8-10: Excellent problem decomposition and memory usage
       - Accurate, well-organized, relevant tracking of facts and actions
       - Consistent use of updated facts for planning
     
     - 5-7: Generally logical with minor issues
       - Minor misses in fact decomposition or memory
       - Occasional lapses in linking new findings to next action
     
     - 3-5: Partially effective reasoning
       - Incomplete or poorly maintained memory
       - Partly speculative or disjointed reasoning
     
     - 0-3: Major logical failures
       - Lost or contradictory facts
       - Little to no coherent progression in reasoning

  ## Output Format
  
  Your evaluation must follow this exact format:
  PRM score: <X> (where X is an integer from 0 to 10)

  ## Example Output
  PRM score: 4