def get_prompt_request_completeness(query: str, doc: str, core_criteria: str) -> str:
  guidance = """
# How to Evaluate Reports (Guidelines)

## 0. Background: What is a Professional Technical Report?
### 1. Definition and Characteristics of Professional Technical Reports
A professional technical report is a document that provides complete and in-depth analysis of a specific technical topic. This is fundamentally different from blog posts, summaries, or presentation slides.
Professional technical reports must have comprehensive scope. They should cover all important aspects of the topic without omission, providing complete coverage including background, current status, methodology, results, implications, and limitations. Academic depth is also essential. Each topic requires detailed explanations spanning multiple paragraphs, and must include analysis, interpretation, and critical evaluation rather than simple fact listing. Comparative and contrastive analysis with related research must also be included.
Rigorous argumentation is another core characteristic of professional technical reports. Sufficient evidence must be provided for all claims, and logical connections and causal relationships must be clearly explained. Opposing views or limitations must also be honestly acknowledged and addressed. Finally, expert-level completeness is required. The report should be fully comprehensible to field experts without additional materials, and must provide concrete, actionable information for practical application. Insights into future research or development directions should also be included.
Professional technical reports aim for complete understanding and practical application, written at substantial length (5,000+ words) through very deep, multi-layered analysis. They follow strict academic structure and target experts and decision-makers. This is fundamentally different from summaries or outlines. Summaries or outlines cover only key points superficially for quick information delivery, using short length (500-1,500 words) with bullet-point structures targeting general audiences and managers, which has completely different characteristics.

### 2. Important Mindset Shift for AI Evaluators
AI evaluators must recognize that general "good writing" standards differ from professional technical report standards. Generally, there is a tendency to consider conciseness as good, prefer what is immediately comprehensible, think that summarizing key points is sufficient, and judge that organizing with tables and lists appears professional. However, these are completely wrong standards for evaluating professional technical reports.
In professional technical reports, more relevant detail is better. They must provide complete understanding and cover all aspects without omission. They should be composed of sufficient paragraphs and argumentation, and this is the correct evaluation standard for professional technical reports. AI evaluators must thoroughly understand this mindset shift and approach evaluation accordingly.

## Overview
Evaluate the completed final report across the specified dimensions and detailed criteria.
Provide systematic evaluation reasoning and assign scores for **every checklist item**. Score = integer 0-10.
For any checklist item where the completed final report provides no assessable material, enter "N/A" instead of a numeric score.

---

## Evaluation Checklist Items

# Complete Checklist Decomposition with Full Details - Requirements vs Quality

## 1. Request Completeness
This criterion evaluates whether the report includes all elements explicitly required in the user query, whether the scope and structure of the analysis are clearly defined to meet user expectations, and whether the final conclusion directly answers the user query and provides actionable insights. In this evaluation, arbitrary judgment standards are not allowed; the Expert Core Criteria (EC) must be applied as the absolute and overriding standard.

### 1.1 Request Completeness (request_completeness)

#### Element 1: *"Does the report include all required elements without omission and present each clearly?"*

**Requirements:**

* **R1-1 (Inclusion):** The report must include all elements required by the User Query together with the corresponding detailed requirements specified by the EC, and omission of any such EC-specified detail means the element is not fulfilled.
* **R1-2 (Explanation / Completeness):** All elements required by the User Query and EC must be presented with clear and understandable explanations. Completeness is judged against the EC; if the explanation for any element falls short of the EC standard, that element is considered omitted.

**Quality:**

* **Q1-1 (Depth):** Each element must be supported with adequate evidence and depth, evaluated against the EC’s required logical, mathematical, or analytical development; incorrect or EC-inconsistent exposition is considered to have no depth.
* **Q1-2 (Length):** Each major requirement must be developed with either (a) at least two well-developed paragraphs (4–8 sentences each) or (b) one substantial paragraph (8–12 sentences), with supporting evidence counted collectively. This length must consist of explanations directly relevant to the User Query and the EC context; if the content is only filler without substantive relation, the requirement is considered unfulfilled.
* **Q1-3 (Analytical Soundness):** The selection and use of evidence, reasoning, validation, and supporting materials must conform to the EC methodology; reasoning or validation inconsistent with the EC is considered unsound, even if superficially plausible.


### 1.2 Request Completeness (request_completeness)

#### Element 2: *"Does the report's structure and emphasis reflect the priority order specified by the Expert Core Criteria (EC)?"*

**Requirements**

* **R2-1 (Structure and Emphasis):** The overall structure and emphasis of the report must strictly follow the priority order defined in the **Expert Core Criteria (EC)** corresponding to the User Query, and the detailed development and content must be consistent with that priority order.
* **R2-2 (Proportional Distribution):** All elements included in the report must be addressed in balance according to the priority levels defined by the **EC**, and across the entire set of elements the length and emphasis must be proportionally distributed. Specifically, higher-priority items must be treated with proportionally greater depth and length than medium-priority items; the length and analytical depth for each priority level must remain balanced in proportion to the assigned priority, and overemphasis on lower-priority items or neglect of higher-priority items is unacceptable.

**Quality**

* **Q2-1 (Structural Separation):** Each major priority item must be presented as a clearly separated structural unit (e.g., distinct section or subsection) in accordance with the **criteria defined in the EC**, and must not be merged with unrelated topics.
* **Q2-2 (Logical Flow):** The sequence and organization of the content must follow the **priority order defined in the EC**, and the relationships between priorities must be made clear to the reader.

### 1.2 Scope Boundary (scope_boundary)

#### Element 1: *"Does the report clearly establish and justify its scope—what is covered, what is excluded, and under what assumptions or limitations?"*

**Requirements**

* **R1-1 (Scope Definition):** The **scope of analysis** must be defined based on the **User Query and EC**, and all scope elements must be fully included without omission.
* **R1-2 (Exclusions):** **Exclusions** must be explicitly identified, including all items required by the **User Query and EC** but not addressed in the report.
* **R1-3 (Assumptions and Limitations):** All **assumptions and limitations** that condition the analysis must be fully presented based on the **User Query and EC**, with no omissions.

**Quality**

* **Q1-1 (Depth & Clarity):** **Scope, exclusions, assumptions, and limitations** must, with **User Query and EC as the primary reference**, be supported with sufficient evidence and depth, and explanations must be specific and clear. Fulfilling the EC is the priority, and if the EC alone is insufficient, **additional relevant elements** must be incorporated. If the EC requirements are not fulfilled, no credit is given even if additional elements are well developed.
* **Q1-2 (Consistency):** **Scope, exclusions, assumptions, and limitations** must be presented consistently throughout the report with **User Query and EC as the primary reference**, and there must be no contradictions or scope drift. If the EC is brief or limited, **additional relevant elements** must be reflected. However, if the EC requirements are not fulfilled, consistency in other parts does not count as fulfillment.
* **Q1-3 (Justification):** **Exclusions and limitations** must be justified with clear reasoning based on the **User Query and EC as the primary reference**. Fulfilling the EC is the priority, and if the EC is insufficient, **additional relevant reasoning and explanation** must be included. If the EC requirements are not fulfilled, no credit is given even if other justifications are strong.


### 1.3 Value Validity (value_validity)

#### Element 1: *"Does the Final Take-away comprehensively address all items in the user query, and is it user-tailored, actionable, and specific?"*

*(Final Take-away = the overall message delivered by the entire report, not limited to the “Conclusion” section)*

**Quality**

* **Q1-1 (Comprehensiveness):** The Final Take-away must cover **all items included in the User Query** without omission, and this comprehensiveness must always be evaluated within the **context of the EC**.
* **Q1-2 (User-tailoring):** The Final Take-away must be **user-tailored, reflecting the user’s conditions, context, and objectives.** This tailoring must also be reviewed strictly within the **context of the EC**.
* **Q1-3 (Actionability):** The Final Take-away must provide **actionable insights and guidance**, and such actionability is recognized only **within the context of the EC**.
* **Q1-4 (Specificity):** The Final Take-away must include **specific elements (e.g., solutions, steps, examples)**, and this specificity is likewise judged within the **context of the EC**.

#### Element 2: "Is the Final Take-away fully supported by the data and arguments in the body?"

**Requirements:**
* R2-1: Every aspect of the Final Take-away is explicitly linked to the data and arguments presented in the body.

**Quality:**
* Q2-1: **Evidence Linkage** – Clear and direct connection between each part of the Final Take-away and specific supporting evidence in the body.
* Q2-2: **Sufficiency & Depth of Evidence** – Supporting evidence is substantial, relevant, and drawn from multiple sections or paragraphs, and used comprehensively to provide convincing support.
* Q2-3: **Rigor & Methodological Consistency** – The Final Take-away is derived through rigorous reasoning based on the body’s evidence, while remaining consistent with the analytical procedures and methods described.
* Q2-4: **Consistency** – No contradictions between the Final Take-away and the data or arguments in the body.

#### Element 3: "Does the Final Take-away clearly disclose its limitations (e.g., constraints, caveats, boundary conditions) and compare itself against existing approaches or alternative options?"

**Requirements:**
* R3-1: The Final Take-away explicitly discloses relevant limitations (e.g., constraints, caveats, boundary conditions). (Appropriateness should be judged with reference to the EC)
* R3-2: The Final Take-away includes existing approaches or alternative options and provides comparative analysis. (Appropriateness should be judged with reference to the EC)

**Quality:**
* Q3-1: **Clarity & Specificity** – Limitations and comparative reasoning are described with sufficient clarity and detail so that readers can easily understand the trade-offs. (Appropriateness should be judged with reference to the EC)

---

## Evaluation Guidelines (Important)
### 1. AI Evaluator Bias Prevention
**Positive bias tendencies must be avoided. To achieve this:**
1. **Evaluate strictly based on checklist sub-criteria and expert core criteria. Do not make arbitrary judgments based on your own criteria rather than the expert core criteria.**

### 2. Prohibition of Model's Arbitrary Evaluation: Evaluate Based on Expert Core Criteria
Each checklist evaluation must be conducted strictly on the basis of the Expert Core Criteria, with all criteria applied in full.
The Expert Core Criteria have absolute priority in every evaluation, and if they are not met, no high score (Perfect or Excellent) may be awarded regardless of the strength of supplementary qualities.
Supplementary factors may only be considered once full compliance with the Expert Core Criteria has been confirmed.

###Expert Core Criteria START
{core_criteria}
###Expert Core Criteria END

### 3. Two Evaluation Scoring Criteria
Each checklist item is evaluated from two perspectives: **Requirements Completeness** and **Requirements Quality**.
Scores are assigned **independently** for each perspective.

#### 3-1. Requirements Completeness Perspective (Completeness)
Evaluates whether the report addresses **all requirements in the checklist without omission**.

* Assess the entire report, not just the well-executed parts
* Confirm that all specified elements are present
* Evaluate using the **Expert Core Criteria**; do not make arbitrary judgments

**Scoring Guidelines:**

* **9–10** (*Perfect*): All requirements fully met; no gaps; no revisions needed
* **7–8** (*Excellent*): Nearly all requirements met; only 1–2 minor omissions with minimal impact
* **5–6** (*Good*): More than half met; most core requirements satisfied, minor elements missing
* **3–4** (*Inadequate*): Some met; multiple gaps in core requirements
* **1–2** (*Poor*): Most requirements missing or addressed only superficially

#### 3-2. Quality Perspective (Adequacy)
Evaluates **how well** the fulfilled requirements are addressed against professional report or academic paper standards.

* Review all relevant quality aspects specified or implied in the checklist item — for example, depth, logic, volume, analytical rigor, precision, comprehensiveness, clarity, accuracy, neutrality, fairness, balance, methodological soundness, and any other factors directly tied to the item’s requirements
* Check all relevant units (examples, sub-elements, sections, paragraphs)
* Base the score on the weakest part only when the weakness is valid and materially affects the criteria. Do not apply mechanical or inappropriate interpretations that introduce unfounded or forced deficiencies.
* Evaluate using the **Expert Core Criteria**; do not make arbitrary judgments

**Scoring Guidelines:**

* **9–10** (*Perfect*): Exceptional quality in all relevant aspects; no revisions needed — *comparable to a top-tier international academic journal or, in certain technical/industry contexts, a best-in-class professional report that meets or exceeds such standards*
* **7–8** (*Excellent*): High quality; meets most academic and professional standards with only minor improvements possible — *comparable to a solid peer-reviewed academic journal, strong PhD-level work, or a high-quality industry report*
* **5–6** (*Good*): Meets essential professional standards; clear structure and competent analysis but with notable areas for improvement — *comparable to a well-executed graduate-level academic paper or standard professional report*
* **3–4** (*Inadequate*): Noticeable deficiencies in multiple aspects; requires significant revision — *comparable to an undergraduate-level academic paper or an entry-level professional report*
* **1–2** (*Poor*): Fails to meet basic professional standards; insufficient depth, rigor, or precision — *below undergraduate level; unsuitable for publication or professional use*


---

## Output Format
The output format evaluates each checklist element by decomposing it into specific Requirements and Quality factors, scoring each factor individually (R1-1, R1-2, Q1-1, Q1-2, etc.), then outputs all detailed scores in JSON format.

For each checklist item:
1. Evaluate all Requirements factors (R) using **3-1 Requirements Completeness** Scoring Guidelines.
2. Evaluate all Quality factors (Q) using **3-2 Quality Perspective** Scoring Guidelines.  
3. Provide individual scores and **critical analysis/justification** for each factor, following the MANDATORY JUSTIFICATION FORMAT (Two-Step Rule) and quoting the exact text from the corresponding Scoring Guidelines (3-1 for Requirements, 3-2 for Quality).  
4. Apply **critical evaluation standards** that identify weaknesses, gaps, and areas where the report fails to meet professional

When evaluating each factor, strictly adhere to ## Evaluation Guidelines and use the decomposed criteria from the expert core criteria.

## **MANDATORY JUSTIFICATION FORMAT (Two-Step Rule):**
Each score justification must follow this exact two-step structure:

1. **Problem description** — A concise, factual observation for the specific checklist factor that clearly reflects the assigned score level. This must directly reference the factor’s focus (e.g., inclusion list completeness, priority depth, section structure, etc.).

2. **Scoring reference** — Immediately after the problem description, write:  
   `thus per [3-1 or 3-2] [Requirements or Quality] Criteria: '[verbatim scoring guideline text for the assigned score range]'`  
   - **3-1 Requirements Criteria** = Use the exact text from the *Requirements Completeness* Scoring Guidelines.  
   - **3-2 Quality Criteria** = Use the exact text from the *Quality Perspective* Scoring Guidelines.  
   - Copy the guideline text **exactly** — no paraphrasing or omission. Preserve punctuation, number ranges, and wording.

**Exact format to use:**
"[Problem description], thus per [3-1 or 3-2] [Requirements or Quality] Criteria: '[verbatim scoring guideline text]' = [Level name]"

Example:
- "Only 1-2 paragraphs per section instead of required 3-5, thus per 3.2 Quality Criteria: 'Significant deficiencies in volume' = Poor level"
- "Document only 2 pages total vs. required 5000+ words, thus per 3.2 Quality Criteria: 'Fails to meet basic professional standards' = Poor level"

## **Consistency Requirement (Mandatory):**
The problem description, quoted guideline text, and assigned score **must** represent the same severity level, and the score must be the natural result of the justification — not predetermined.

1. Problem Description Requirement
- The problem description must provide concrete, objective observations that directly support the assigned severity level and align with the corresponding evaluation criteria.
- The score must be derived naturally from this justification, not chosen first.

2. Positive-Only Restriction
- If the problem description indicates no omissions, gaps, or deficiencies, you MUST assign a Perfect (9–10) score and quote the Perfect-level guideline text.
- No positive-only descriptions are allowed for scores below Perfect; deficiencies must be explicitly described in the problem description.
- If an Excellent (7–8) or lower guideline is quoted, the problem description MUST explicitly state the specific omissions, gaps, or deficiencies that justify that score range.

3. Severity Alignment
- The problem description, quoted guideline text, and assigned score must all reflect the same severity level without contradiction.

4. Mandatory Final Self-Check (Hard Rule)
Before producing the final JSON output, you MUST explicitly review **every factor** for this rule:
    a) If any factor has a score below 9 and no deficiency in the problem description:
        - Add a concise deficiency that matches the assigned score range, OR
        - Raise the score to Perfect (9–10) and update the guideline text accordingly.
    b) Do not produce the final JSON until all such cases are corrected.

5. Scope of Application
- This rule applies equally to both 3-1 Requirements and 3-2 Quality evaluations.

This ensures every score is directly tied to the defined criteria, preventing arbitrary scoring.

---

**Summary Scores:**
```json
{
  "scores": {
    "request_completeness": {
      "request_completeness": {
        "1": {
          "R1-1": {
            "description": "All required elements from User Query and EC (functions, formats, datasets, training methods, evaluation protocols) are fully included with EC-specified details, thus per Requirements Criteria: 'All requirements fully met; no gaps; no revisions needed' = Perfect level",
            "score": 10
          },
          "R1-2": {
            "description": "Most required elements are presented with clear explanations meeting EC standards, with only 1-2 minor gaps in EC-specified details that have minimal impact, thus per Requirements Criteria: 'Nearly all requirements met; only 1–2 minor omissions with minimal impact' = Excellent level",
            "score": 8
          },
          "Q1-1": {
            "description": "Evidence and depth are solid meeting EC's logical/analytical standards, though some sections could benefit from deeper EC-consistent exposition, thus per Quality Criteria: 'Meets essential professional standards; clear structure and competent analysis but with notable areas for improvement — comparable to a well-executed graduate-level academic paper or standard professional report' = Good level",
            "score": 6
          },
          "Q1-2": {
            "description": "Several major requirements have 3-5 sentence paragraphs falling short of the required 4-8 sentences per paragraph or 8-12 for substantial paragraphs, with insufficient substantive content directly relevant to User Query and EC context, thus per Quality Criteria: 'Noticeable deficiencies in multiple aspects; requires significant revision — comparable to an undergraduate-level academic paper or an entry-level professional report' = Inadequate level",
            "score": 3
          },
          "Q1-3": {
            "description": "Reasoning and validation are inconsistent with EC methodology; key analytical steps required by EC are missing or contradict EC standards, thus per Quality Criteria: 'Fails to meet basic professional standards; insufficient depth, rigor, or precision — below undergraduate level; unsuitable for publication or professional use' = Poor level",
            "score": 1
          }
        },
        "2": {
          "R2-1": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "R2-2": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q2-1": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q2-2": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"}
        }
      },
      "scope_boundary": {
        "1": {
          "R1-1": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "R1-2": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "R1-3": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q1-1": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q1-2": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q1-3": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"}
        }
      },
      "value_validity": {
        "1": {
          "Q1-1": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q1-2": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q1-3": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q1-4": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"}
        },
        "2": {
          "R2-1": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q2-1": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q2-2": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q2-3": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q2-4": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"}
        },
        "3": {
          "R3-1": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "R3-2": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q3-1": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"}
        }
      }
    }
  }
}
```
"""
  guidance=guidance.replace("{core_criteria}", core_criteria)
  return (
  f"[Evaluation: Comprehensive Report Review]\n\n"
  f"[User Query]\n\n{query}\n\n\n"
  f"[Expert Report]\n\n{doc}\n\n\n"
  f"[Guidelines]\n\n{guidance}"
  )

def get_prompt_evidence_validity(query: str, doc: str, core_criteria: str) -> str:
  guidance = """
# How to Evaluate Reports (Guidelines)

## 0. Background: What is a Professional Technical Report?
### 1. Definition and Characteristics of Professional Technical Reports
A professional technical report is a document that provides complete and in-depth analysis of a specific technical topic. This is fundamentally different from blog posts, summaries, or presentation slides.
Professional technical reports must have comprehensive scope. They should cover all important aspects of the topic without omission, providing complete coverage including background, current status, methodology, results, implications, and limitations. Academic depth is also essential. Each topic requires detailed explanations spanning multiple paragraphs, and must include analysis, interpretation, and critical evaluation rather than simple fact listing. Comparative and contrastive analysis with related research must also be included.
Rigorous argumentation is another core characteristic of professional technical reports. Sufficient evidence must be provided for all claims, and logical connections and causal relationships must be clearly explained. Opposing views or limitations must also be honestly acknowledged and addressed. Finally, expert-level completeness is required. The report should be fully comprehensible to field experts without additional materials, and must provide concrete, actionable information for practical application. Insights into future research or development directions should also be included.
Professional technical reports aim for complete understanding and practical application, written at substantial length (5,000+ words) through very deep, multi-layered analysis. They follow strict academic structure and target experts and decision-makers. This is fundamentally different from summaries or outlines. Summaries or outlines cover only key points superficially for quick information delivery, using short length (500-1,500 words) with bullet-point structures targeting general audiences and managers, which has completely different characteristics.

### 2. Important Mindset Shift for AI Evaluators
AI evaluators must recognize that general "good writing" standards differ from professional technical report standards. Generally, there is a tendency to consider conciseness as good, prefer what is immediately comprehensible, think that summarizing key points is sufficient, and judge that organizing with tables and lists appears professional. However, these are completely wrong standards for evaluating professional technical reports.
In professional technical reports, more relevant detail is better. They must provide complete understanding and cover all aspects without omission. They should be composed of sufficient paragraphs and argumentation, and this is the correct evaluation standard for professional technical reports. AI evaluators must thoroughly understand this mindset shift and approach evaluation accordingly.

## Overview
Evaluate the completed final report across the specified dimensions and detailed criteria.
Provide systematic evaluation reasoning and assign scores for **every checklist item**. Score = integer 0-10.
For any checklist item where the completed final report provides no assessable material, enter "N/A" instead of a numeric score.

---

## Evaluation Checklist Items

# Complete Checklist Decomposition with Full Details - Requirements vs Quality

## 2. Evidence Validity

### 2.1 Numeric Accuracy (numeric_accuracy)

#### Element 1: "**Calculation errors** – Are all computations free of mistakes with methodological precision and analytical rigor? All calculations should be verifiable and clearly presented with step-by-step derivation where complex."

**Requirements:**
* R1-1: All computations are free of calculation errors
* R1-2: All non-trivial information required for calculation verification must be provided, while elements universally understood in professional practice are not considered missing.

**Quality:**
* Q1-1: **Verifiability & Clarity** – Sufficient detail and clear presentation allow a reviewer to reproduce all results without ambiguity

#### Element 2: "Are the **formulas or statistical models (methods)** used to derive the figures appropriate to the problem context, with variable definitions and assumptions stated clearly? Each formula should be properly introduced with comprehensive variable definitions and contextual justification (typically requiring detailed explanation spanning multiple sentences per formula)."

**Requirements:**
* R2-1: All formulas or statistical models used are explicitly stated
* R2-2: Variables and assumptions for each formula or model are clearly defined

**Quality:**
* Q2-1: **Appropriateness to Context** – The chosen formulas/models must be suitable for the problem’s context and objectives. Evaluation should primarily take the Expert Core Criteria (EC) into account.
* Q2-2: **Completeness of Definitions & Assumptions** –Completeness of Definitions & Assumptions – All variables and assumptions must be fully defined without omissions or ambiguity. Evaluation should primarily take the Expert Core Criteria (EC) into account.
* Q2-3: **Contextual Justification** – Each formula/model must be explained in sufficient detail, typically over multiple sentences, showing why it was selected over alternatives. Evaluation should primarily take the Expert Core Criteria (EC) into account.

#### Element 3: "Are the figures interpreted **objectively, without exaggeration or distortion** with analytical rigor and methodological precision? Interpretations should be evidence-based and measured, avoiding superlative language without supporting data."

**Requirements:**
* R3-1: Figures are interpreted objectively, meaning they are evidence-based and free from exaggeration or distortion.

**Quality:**
* Q3-1: All interpretations must maintain analytical rigor and methodological precision. Evaluation is primarily based on the Expert Core Criteria (EC), and when EC compliance is met, other required factors are also taken into account.
* Q3-2: Measured approach avoiding superlative language without supporting data

#### Element 4: "When making quantitative comparisons across multiple subjects, are the comparison criteria clearly defined and applied so that the comparisons are fair, valid, and not misleading?"

**Requirements:**
* R4-1: Comparison criteria are clearly defined.

**Quality:**
* Q4-1: Comparison criteria are applied in a methodologically appropriate way, ensuring fair and valid comparisons.

#### Element 5: "Are the metrics and measurement units used in the report appropriate to the analytical context, with methodological precision? Metric and unit choices should be justified based on standards and analytical requirements, with explanations provided for any non-standard selections"

**Requirements:**
* R5-1: Metric and unit choices must be appropriate for all reported figures.
* R5-2: Any non-standard metrics or units must be explicitly explained.

**Quality:**
* Q5-1: Metric and unit choices are methodologically precise and justified based on standards and analytical requirements.
* Q5-2: Metrics and units must be applied consistently within each comparison to avoid misleading or unfair interpretation.

### 2.2 Logical Support (logical_support)

#### Element 1 (Logical Structure): "Does the logical flow stay consistently aligned with the stated topic? The report should maintain thematic focus to demonstrate logical progression." (When evaluating this element and its sub-criteria, the Expert Core Criteria (EC) should be primarily taken into account.)

**Requirements:**
* R1-1: The report maintains a clear logical flow aligned with the stated topic, without digressions, ensuring thematic focus throughout.
* R1-1: The logical flow must include all logical steps necessary for topic development. Specifically, evaluation should first verify compliance with all steps specified in the EC, and then assess any other necessary elements.

**Quality:**
* Q1-1: The included logical steps must be developed with analytical rigor and comprehensive coherence.
* Q1-2: The report must show clear and precise logical progression, with multiple sections organically connected and building upon each other.

#### Element 2 (Contextual Foundation): "When presenting arguments, does the report provide the necessary context and background with sufficient depth and coverage?" (When evaluating this element and its sub-criteria, the Expert Core Criteria (EC) should be primarily taken into account.)

**Requirements:**
* R2-1: Every key arguments or analytical claims must be accompanied by appropriate context and background information to ensure reader understanding.
* R2-1: The background must cover all necessary aspects required for logical development, without omission. Specifically, evaluation should first check compliance with all aspects specified in the Expert Core Criteria (EC), and then assess any additional necessary parts.

**Quality:**
* Q2-1: The included background explanations must be relevant to the argument and developed with sufficient depth and comprehensiveness, typically consisting of multiple paragraphs that include historical context, the current state, and the significance of the problem.

#### Element 3 (Assumptions): "Are the underlying assumptions, comparison baselines, and reasoning processes clearly disclosed, and are their limitations acknowledged?" (When evaluating this element and its sub-criteria, the Expert Core Criteria (EC) should be primarily taken into account.)

**Requirements:**
* R3-1: The necessary underlying assumptions (assumptions, comparison baselines, reasoning processes) supporting key claims or interpretations must be clearly disclosed, and their uncertainties or limitations should be explicitly acknowledged where relevant.
* R3-2: All assumptions essential for the logical development must be covered without omission, and the evaluation should check that no critical assumptions are missing. Specifically, evaluation should first check completeness against the assumptions required by the EC, and then assess any additional necessary aspects

**Quality:**
* Q3-1: Assumptions and methodological choices must be explained and discussed clearly and convincingly, sufficient to support the logical development.

#### Element 4 (Evidence Analysis): "Does the report treat facts and evidence analytically, rather than simply listing them? Facts, data, and findings — including information drawn from other sources — should be analyzed and interpreted."(When evaluating this element and its sub-criteria, the Expert Core Criteria (EC) should be primarily taken into account.)

**Requirements:**
* R4-1: Facts, data, findings, and relevant content from cited sources are not merely listed but are analyzed and interpreted.

**Quality:**
* Q4-1: Key evidence is developed and presented with appropriate context, meaning, and implications.  
* Q4-2: Analysis shows sufficient depth and rigor (reasoning, causes, effects, limitations, methodological considerations).

#### Element 5 (Reasoning): "Does the report ensure that all claims logically follow from the previously presented facts, data, interpretation, and reasoning, without skipped steps or logical leaps?" (When evaluating this element and its sub-criteria, the Expert Core Criteria (EC) should be primarily taken into account.)

**Requirements:**
* R5-1: Every claim that requires logical support must logically follow from the previously presented facts, data, reasoning, and their interpretations; unsupported claims are regarded as logical leaps.
* R5-2: The argument must incorporate all major evidence required for the logical development, including the key evidence specified in the Expert Core Criteria (EC), ensuring that no essential evidence is omitted.

**Quality:**
* Q5-1: Each claim should be supported by clear, well-developed reasoning that strongly links it to the underlying evidence.
* Q5-2: Each major claim should be supported not just by isolated pieces of evidence but by the overall evidence trail. 

#### Element 6 (Critical Response): "Does the report acknowledge relevant counter-evidence or alternative scenarios, and address them with logical rebuttals or supporting evidence" (When evaluating this element and its sub-criteria, the Expert Core Criteria (EC) should be primarily taken into account.)

**Requirements:**
* R6-1: Commonly cited counter-evidence acknowledged  
* R6-2: Alternative scenarios are considered when appropriate.  

**Quality:**
* Q6-1: Counter-evidence, alternative scenarios, and rebuttals are addressed in a valid, balanced, and credible manner.  



---

## Evaluation Guidelines (Important)
### 1. AI Evaluator Bias Prevention
**Positive bias tendencies must be avoided. To achieve this:**
1. **Evaluate strictly based on checklist sub-criteria and expert core criteria. Do not make arbitrary judgments based on your own criteria rather than the expert core criteria.**


### 2. Prohibition of Model's Arbitrary Evaluation: Evaluate Based on Expert Core Criteria
Each checklist evaluation must be conducted strictly on the basis of the Expert Core Criteria, with all criteria applied in full.
The Expert Core Criteria have absolute priority in every evaluation, and if they are not met, no high score (Perfect or Excellent) may be awarded regardless of the strength of supplementary qualities.
Supplementary factors may only be considered once full compliance with the Expert Core Criteria has been confirmed.

###Expert Core Criteria START
{core_criteria}
###Expert Core Criteria END

### 3. Two Evaluation Scoring Criteria
Each checklist item is evaluated from two perspectives: **Requirements Completeness** and **Requirements Quality**.
Scores are assigned **independently** for each perspective.

#### 3-1. Requirements Completeness Perspective (Completeness)
Evaluates whether the report addresses **all requirements in the checklist without omission**.

* Assess the entire report, not just the well-executed parts
* Confirm that all specified elements are present
* Evaluate using the **Expert Core Criteria**; do not make arbitrary judgments

**Scoring Guidelines:**

* **9–10** (*Perfect*): All requirements fully met; no gaps; no revisions needed
* **7–8** (*Excellent*): Nearly all requirements met; only 1–2 minor omissions with minimal impact
* **5–6** (*Good*): More than half met; most core requirements satisfied, minor elements missing
* **3–4** (*Inadequate*): Some met; multiple gaps in core requirements
* **1–2** (*Poor*): Most requirements missing or addressed only superficially

#### 3-2. Quality Perspective (Adequacy)
Evaluates **how well** the fulfilled requirements are addressed against professional report or academic paper standards.

* Review all relevant quality aspects specified or implied in the checklist item — for example, depth, logic, volume, analytical rigor, precision, comprehensiveness, clarity, accuracy, neutrality, fairness, balance, methodological soundness, and any other factors directly tied to the item’s requirements
* Check all relevant units (examples, sub-elements, sections, paragraphs)
* Base the score on the weakest part only when the weakness is valid and materially affects the criteria. Do not apply mechanical or inappropriate interpretations that introduce unfounded or forced deficiencies.
* Evaluate using the **Expert Core Criteria**; do not make arbitrary judgments

**Scoring Guidelines:**

* **9–10** (*Perfect*): Exceptional quality in all relevant aspects; no revisions needed — *comparable to a top-tier international academic journal or, in certain technical/industry contexts, a best-in-class professional report that meets or exceeds such standards*
* **7–8** (*Excellent*): High quality; meets most academic and professional standards with only minor improvements possible — *comparable to a solid peer-reviewed academic journal, strong PhD-level work, or a high-quality industry report*
* **5–6** (*Good*): Meets essential professional standards; clear structure and competent analysis but with notable areas for improvement — *comparable to a well-executed graduate-level academic paper or standard professional report*
* **3–4** (*Inadequate*): Noticeable deficiencies in multiple aspects; requires significant revision — *comparable to an undergraduate-level academic paper or an entry-level professional report*
* **1–2** (*Poor*): Fails to meet basic professional standards; insufficient depth, rigor, or precision — *below undergraduate level; unsuitable for publication or professional use*


---

## Output Format
The output format evaluates each checklist element by decomposing it into specific Requirements and Quality factors, scoring each factor individually (R1-1, R1-2, Q1-1, Q1-2, etc.), then outputs all detailed scores in JSON format.

For each checklist item:
1. Evaluate all Requirements factors (R) using **3-1 Requirements Completeness** Scoring Guidelines.
2. Evaluate all Quality factors (Q) using **3-2 Quality Perspective** Scoring Guidelines.  
3. Provide individual scores and **critical analysis/justification** for each factor, following the MANDATORY JUSTIFICATION FORMAT (Two-Step Rule) and quoting the exact text from the corresponding Scoring Guidelines (3-1 for Requirements, 3-2 for Quality).  
4. Apply **critical evaluation standards** that identify weaknesses, gaps, and areas where the report fails to meet professional

When evaluating each factor, strictly adhere to ## Evaluation Guidelines and use the decomposed criteria from the expert core criteria.

## **MANDATORY JUSTIFICATION FORMAT (Two-Step Rule):**
Each score justification must follow this exact two-step structure:

1. **Problem description** — A concise, factual observation for the specific checklist factor that clearly reflects the assigned score level. This must directly reference the factor’s focus (e.g., inclusion list completeness, priority depth, section structure, etc.).

2. **Scoring reference** — Immediately after the problem description, write:  
   `thus per [3-1 or 3-2] [Requirements or Quality] Criteria: '[verbatim scoring guideline text for the assigned score range]'`  
   - **3-1 Requirements Criteria** = Use the exact text from the *Requirements Completeness* Scoring Guidelines.  
   - **3-2 Quality Criteria** = Use the exact text from the *Quality Perspective* Scoring Guidelines.  
   - Copy the guideline text **exactly** — no paraphrasing or omission. Preserve punctuation, number ranges, and wording.

**Exact format to use:**
"[Problem description], thus per [3-1 or 3-2] [Requirements or Quality] Criteria: '[verbatim scoring guideline text]' = [Level name]"

Example:
- "Only 1-2 paragraphs per section instead of required 3-5, thus per 3.2 Quality Criteria: 'Significant deficiencies in volume' = Poor level"
- "Document only 2 pages total vs. required 5000+ words, thus per 3.2 Quality Criteria: 'Fails to meet basic professional standards' = Poor level"

## **Consistency Requirement (Mandatory):**
The problem description, quoted guideline text, and assigned score **must** represent the same severity level, and the score must be the natural result of the justification — not predetermined.

1. Problem Description Requirement
- The problem description must provide concrete, objective observations that directly support the assigned severity level and align with the corresponding evaluation criteria.
- The score must be derived naturally from this justification, not chosen first.

2. Positive-Only Restriction
- If the problem description indicates no omissions, gaps, or deficiencies, you MUST assign a Perfect (9–10) score and quote the Perfect-level guideline text.
- No positive-only descriptions are allowed for scores below Perfect; deficiencies must be explicitly described in the problem description.
- If an Excellent (7–8) or lower guideline is quoted, the problem description MUST explicitly state the specific omissions, gaps, or deficiencies that justify that score range.

3. Severity Alignment
- The problem description, quoted guideline text, and assigned score must all reflect the same severity level without contradiction.

4. Mandatory Final Self-Check (Hard Rule)
Before producing the final JSON output, you MUST explicitly review **every factor** for this rule:
    a) If any factor has a score below 9 and no deficiency in the problem description:
        - Add a concise deficiency that matches the assigned score range, OR
        - Raise the score to Perfect (9–10) and update the guideline text accordingly.
    b) Do not produce the final JSON until all such cases are corrected.

5. Scope of Application
- This rule applies equally to both 3-1 Requirements and 3-2 Quality evaluations.

This ensures every score is directly tied to the defined criteria, preventing arbitrary scoring.

---

**Summary Scores:**
```json
{
  "scores": {
    "evidence_validity": {
      "numeric_accuracy": {
        "1": {
          "R1-1": {
            "description": "All computations are presented with correct results and no detectable errors, thus per Requirements Criteria: 'All requirements fully met; no gaps; no revisions needed' = Perfect level",
            "score": 10
          },
          "R1-2": {
            "description": "All non-trivial information required for calculation verification is provided, and elements universally understood in professional practice are not treated as missing, thus per Requirements Criteria: 'All requirements fully met; no gaps; no revisions needed' = Perfect level",
            "score": 10
          },
          "Q1-1": {
            "description": "Most calculations include sufficient detail for verification with clear presentation, but 1-2 complex derivations lack step-by-step breakdown making full reproduction slightly difficult, thus per Quality Criteria: 'High quality across most dimensions; minor refinements possible — comparable to a strong doctoral-level paper or high-quality professional report' = Excellent level",
            "score": 8
          }
        },
        "2": {
          "R2-1": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "R2-2": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q2-1": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q2-2": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q2-3": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"}
        },
        "3": {
          "R3-1": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q3-1": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q3-2": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"}
        },
        "4": {
          "R4-1": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q4-1": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"}
        },
        "5": {
          "R5-1": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "R5-2": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q5-1": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q5-2": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"}
        }
      },
      "logical_support": {
        "1": {
          "R1-1": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "R1-2": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q1-1": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q1-2": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"}
        },
        "2": {
          "R2-1": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "R2-2": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q2-1": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"}
        },
        "3": {
          "R3-1": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "R3-2": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q3-1": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"}
        },
        "4": {
          "R4-1": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q4-1": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q4-2": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"}
        },
        "5": {
          "R5-1": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "R5-2": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q5-1": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q5-2": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"}
        },
        "6": {
          "R6-1": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "R6-2": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q6-1": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"}
        }
      }
    }
  }
}
```
"""
  guidance=guidance.replace("{core_criteria}", core_criteria)
  return (
  f"[Evaluation: Comprehensive Report Review]\n\n"
  f"[User Query]\n\n{query}\n\n\n"
  f"[Expert Report]\n\n{doc}\n\n\n"
  f"[Guidelines]\n\n{guidance}"
  )

def get_prompt_structure_consistency(query: str, doc: str, core_criteria: str) -> str:
  guidance = """
# How to Evaluate Reports (Guidelines)

## 0. Background: What is a Professional Technical Report?
### 1. Definition and Characteristics of Professional Technical Reports
A professional technical report is a document that provides complete and in-depth analysis of a specific technical topic. This is fundamentally different from blog posts, summaries, or presentation slides.
Professional technical reports must have comprehensive scope. They should cover all important aspects of the topic without omission, providing complete coverage including background, current status, methodology, results, implications, and limitations. Academic depth is also essential. Each topic requires detailed explanations spanning multiple paragraphs, and must include analysis, interpretation, and critical evaluation rather than simple fact listing. Comparative and contrastive analysis with related research must also be included.
Rigorous argumentation is another core characteristic of professional technical reports. Sufficient evidence must be provided for all claims, and logical connections and causal relationships must be clearly explained. Opposing views or limitations must also be honestly acknowledged and addressed. Finally, expert-level completeness is required. The report should be fully comprehensible to field experts without additional materials, and must provide concrete, actionable information for practical application. Insights into future research or development directions should also be included.
Professional technical reports aim for complete understanding and practical application, written at substantial length (5,000+ words) through very deep, multi-layered analysis. They follow strict academic structure and target experts and decision-makers. This is fundamentally different from summaries or outlines. Summaries or outlines cover only key points superficially for quick information delivery, using short length (500-1,500 words) with bullet-point structures targeting general audiences and managers, which has completely different characteristics.

### 2. Important Mindset Shift for AI Evaluators
AI evaluators must recognize that general "good writing" standards differ from professional technical report standards. Generally, there is a tendency to consider conciseness as good, prefer what is immediately comprehensible, think that summarizing key points is sufficient, and judge that organizing with tables and lists appears professional. However, these are completely wrong standards for evaluating professional technical reports.
In professional technical reports, more relevant detail is better. They must provide complete understanding and cover all aspects without omission. They should be composed of sufficient paragraphs and argumentation, and this is the correct evaluation standard for professional technical reports. AI evaluators must thoroughly understand this mindset shift and approach evaluation accordingly.

## Overview
Evaluate the completed final report across the specified dimensions and detailed criteria.
Provide systematic evaluation reasoning and assign scores for **every checklist item**. Score = integer 0-10.
For any checklist item where the completed final report provides no assessable material, enter "N/A" instead of a numeric score.

---

## Evaluation Checklist Items

# Complete Checklist Decomposition with Full Details - Requirements vs Quality

## 3. Structure Consistency 

### 3.1 Introduction (structure_intro)

#### Element 1: "Does the introduction clearly present the subject, problem, and significance of the report in a focused manner, avoiding excessive generalization or irrelevant digressions? The introduction should provide sufficient context and motivation for the reader." (When evaluating this element and its sub-criteria, the Expert Core Criteria (EC) should be primarily taken into account.)

**Requirements:**
* R1-1: The introduction must include the subject, problem, and significance of the report, and provide sufficient background and motivation to help the reader understand the context and rationale of the report.

**Quality:**
* Q1-1: Introduction is of sufficient length (generally ≥300 words for professional reports) and all components are present with adequate coverage to set up the report.
* Q1-2: Each component is described clearly and specifically, avoiding excessive generalization or ambiguity.
* Q1-3: The introduction presents its components in a logical and coherent flow, ensuring the reader can easily follow the overall direction of the report.

#### Element 2: "Does the introduction appropriately present the main methodological approaches or overall flow that will guide the report, so the structure and approach are clear at a glance?" (When evaluating this element and its sub-criteria, the Expert Core Criteria (EC) should be primarily taken into account.)

**Requirements:**
* R2-1: The introduction outlines the main methodological approaches or overall flow that will guide how the report develops its subject.

**Quality:**
* Q2-1: The approaches or flow are presented clearly and coherently, so the reader can easily understand the overall structure of the report.

#### Element 3: "Does the introduction set a clear analytical frame by outlining the report’s scope and briefly noting any exclusions or key assumptions/limitations that are essential for context?" (When evaluating this element and its sub-criteria, the Expert Core Criteria (EC) should be primarily taken into account.)

**Requirements:**
* R3-1: The introduction indicates the general scope of the report to frame the context.
* R3-2: (Conditional) In accordance with the Expert Core Criteria (EC), major exclusions must be stated in the introduction only when EC identifies them as essential for clarifying the analytical boundaries of the report.
* R3-3: (Conditional) In accordance with the Expert Core Criteria (EC), key assumptions or limitations must be briefly mentioned in the introduction only when EC determines they are critical to the reader’s understanding of the report’s analytical frame. Detailed discussion may be provided later in the main body.

**Quality:**
* Q3-1: Scope, and when included, exclusions and assumptions/limitations, are presented clearly and professionally.

#### Element 4: "Does the introduction avoid including unnecessary background or expanded topics outside the user’s request, and present only the essential content necessary to frame and support the report’s argument?" (When evaluating this element and its sub-criteria, the Expert Core Criteria (EC) should be primarily taken into account.)

**Requirements:**
* R4-1: Introduction excludes unnecessary background and topics outside the user’s request

### 3.2 Body (structure_body)

#### Element 1: "In terms of framing and structure, does the body include all the stages or organizing structures promised in the introduction (e.g., background → cause → impact → alternative, or comparative, evaluative, policy-oriented structures) without omission, and is the body developed consistently according to that principle from start to finish?" (When evaluating this element and its sub-criteria, the Expert Core Criteria (EC) should be primarily taken into account.)

**Requirements:**
* R1-1: All stages or organizing structures promised in the introduction must be included in the body without omission.
* R1-2: Body sections must follow and adhere to the organizing principle or structural framework presented in the introduction, maintaining the same order and logic throughout the report.

**Quality:**
* Q1-1: Each stage must have a dedicated section or subsection addressing its objectives, key evidence, and analysis, with at least one full paragraph (typically 4–8 sentences or more). Additional methodological details or supporting evidence may be placed in appendices, which are considered part of the stage’s development.
* Q1-2: Does each stage include all key content components (e.g., period divisions, main cases or subjects, core variables or factors) necessary to realize the developmental framework presented in the introduction? (Evaluation should be based primarily on the Expert Core Criteria (EC), focusing on whether all EC-required elements are present.)

#### Element 2: "Does the discussion remain consistent with the scope defined in the introduction? If discussion deviates, is it clearly identified as outside the scope (e.g., context, limitations, future work) so as to prevent confusion or scope drift?" (When evaluating this element and its sub-criteria, the Expert Core Criteria (EC) should be primarily taken into account.)

**Requirements:**
* R2-1: The discussion remains within the scope defined in the introduction.
* R2-2: If any content goes beyond the defined scope, it is clearly identified as outside the scope so that readers are not confused.

**Quality:**
* Q2-1: Content outside the scope is clearly distinguished, making it easy for readers to recognize.
* Q2-2: Content outside the scope is sufficiently justified, explaining why it is included and what its relevance is.

### 3.3 Conclusion (structure_conclusion) 

#### Element 1: "Does the conclusion synthesize the key arguments from the body and clearly and coherently complete the topic and purpose set out in the introduction? The conclusion should not merely repeat earlier content but integrate the main findings into a single coherent final message." (When evaluating this element and its sub-criteria, the Expert Core Criteria (EC) should be primarily taken into account.)

**Quality:**
* Q1-1: The conclusion must synthesize the key arguments developed in the body and fully complete the topic and purpose set out in the introduction, bringing the discussion to a coherent structural close.
* Q1-2: Is the synthesized discussion sufficiently valid and thorough? The conclusion must be presented in at least one substantial paragraph containing 5–8 complete sentences.

#### Element 2: "Does the conclusion synthesize and reflect on the content established in the introduction and body, without introducing unsupported claims, new evidence, or unrelated scope expansions?"

**Requirements:**
* R2-1: The conclusion must not introduce new claims or evidence; all content must be explicitly grounded in the introduction and/or body. Higher-level synthesis or generalization is acceptable if clearly derived from existing content.

**Quality:**
* Q2-1: In-scope content must be presented clearly, explicitly tied to the introduction and body, with no ambiguity about its grounding. If out-of-scope references (e.g., limitations, future research) are included, they must be distinctly framed and clearly separated from the main conclusion.

### 3.4 Section-Level (structure_section) 

#### Element 1: "Is each section internally well-structured and logically organized?"

**Requirements:**
* R1-1: Each section follows a clear and identifiable organizing principle (e.g., problem → cause analysis → solution, background → methodology → findings → interpretation, definition → examples → implications) (When evaluating this element and its sub-criteria, the Expert Core Criteria (EC) should be primarily taken into account.)

**Quality:**
* Q1-1: The section’s organizing principle is clear and easily identifiable to the reader
* Q1-2: The section’s content is fully aligned with its organizing principle, without introducing unrelated or misplaced elements

#### Element 2: "Are the relationships between sections clear, coherent, and logically maintained?" (When evaluating this element and its sub-criteria, the Expert Core Criteria (EC) should be primarily taken into account.)

**Requirements:**
* R2-1: Sections must not contain unnecessary duplication.
* R2-2: No contradictions between sections
* R2-3: Cross-references accurately reflect the content they refer to
* R2-4: Coherence is maintained between sections with appropriate transitions or signposting where necessary

**Quality:**
* Q2-1: Section-to-section relationships are logically consistent and reinforce the overall document structure

#### Element 3: "Does each section fully address its core points without irrelevant content?" (When evaluating this element and its sub-criteria, the Expert Core Criteria (EC) should be primarily taken into account.)

**Requirements:**
* R3-1: All core points relevant to the section’s purpose are included
* R3-2: Unnecessary or off-topic background information is excluded
* R3-3: All charts and tables must directly support the section topic, with no irrelevant or misleading visual material included.

**Quality:**
* Q3-1: All core points are developed with sufficient depth and detail to support the section’s purpose, typically requiring at least one well-developed paragraph (4–8 complete sentences) per core point

---

## Evaluation Guidelines (Important)
### 1. AI Evaluator Bias Prevention
**Positive bias tendencies must be avoided. To achieve this:**
1. **Evaluate strictly based on checklist sub-criteria and expert core criteria. Do not make arbitrary judgments based on your own criteria rather than the expert core criteria.**


### 2. Prohibition of Model's Arbitrary Evaluation: Evaluate Based on Expert Core Criteria
Each checklist evaluation must be based on expert core criteria, not the evaluator's arbitrary standards.
Expert core criteria represent essential content, conditions, and requirements that must be addressed in high-quality reports.
When evaluating each checklist, prioritize reference to the expert core criteria below. **Especially when content judgment is difficult, use expert core criteria as the primary reference standard.**

###Expert Core Criteria START
{core_criteria}
###Expert Core Criteria END

### 3. Two Evaluation Scoring Criteria
Each checklist item is evaluated from two perspectives: **Requirements Completeness** and **Requirements Quality**.
Scores are assigned **independently** for each perspective.

#### 3-1. Requirements Completeness Perspective (Completeness)
Evaluates whether the report addresses **all requirements in the checklist without omission**.

* Assess the entire report, not just the well-executed parts
* Confirm that all specified elements are present
* Evaluate using the **Expert Core Criteria**; do not make arbitrary judgments

**Scoring Guidelines:**

* **9–10** (*Perfect*): All requirements fully met; no gaps; no revisions needed
* **7–8** (*Excellent*): Nearly all requirements met; only 1–2 minor omissions with minimal impact
* **5–6** (*Good*): More than half met; most core requirements satisfied, minor elements missing
* **3–4** (*Inadequate*): Some met; multiple gaps in core requirements
* **1–2** (*Poor*): Most requirements missing or addressed only superficially

#### 3-2. Quality Perspective (Adequacy)
Evaluates **how well** the fulfilled requirements are addressed against professional report or academic paper standards.

* Review all relevant quality aspects specified or implied in the checklist item — for example, depth, logic, volume, analytical rigor, precision, comprehensiveness, clarity, accuracy, neutrality, fairness, balance, methodological soundness, and any other factors directly tied to the item’s requirements
* Check all relevant units (examples, sub-elements, sections, paragraphs)
* Base the score on the weakest part only when the weakness is valid and materially affects the criteria. Do not apply mechanical or inappropriate interpretations that introduce unfounded or forced deficiencies.
* Evaluate using the **Expert Core Criteria**; do not make arbitrary judgments

**Scoring Guidelines:**

* **9–10** (*Perfect*): Exceptional quality in all relevant aspects; no revisions needed — *comparable to a top-tier international academic journal or, in certain technical/industry contexts, a best-in-class professional report that meets or exceeds such standards*
* **7–8** (*Excellent*): High quality; meets most academic and professional standards with only minor improvements possible — *comparable to a solid peer-reviewed academic journal, strong PhD-level work, or a high-quality industry report*
* **5–6** (*Good*): Meets essential professional standards; clear structure and competent analysis but with notable areas for improvement — *comparable to a well-executed graduate-level academic paper or standard professional report*
* **3–4** (*Inadequate*): Noticeable deficiencies in multiple aspects; requires significant revision — *comparable to an undergraduate-level academic paper or an entry-level professional report*
* **1–2** (*Poor*): Fails to meet basic professional standards; insufficient depth, rigor, or precision — *below undergraduate level; unsuitable for publication or professional use*


---

## Output Format
The output format evaluates each checklist element by decomposing it into specific Requirements and Quality factors, scoring each factor individually (R1-1, R1-2, Q1-1, Q1-2, etc.), then outputs all detailed scores in JSON format.

For each checklist item:
1. Evaluate all Requirements factors (R) using **3-1 Requirements Completeness** Scoring Guidelines.
2. Evaluate all Quality factors (Q) using **3-2 Quality Perspective** Scoring Guidelines.  
3. Provide individual scores and **critical analysis/justification** for each factor, following the MANDATORY JUSTIFICATION FORMAT (Two-Step Rule) and quoting the exact text from the corresponding Scoring Guidelines (3-1 for Requirements, 3-2 for Quality).  
4. Apply **critical evaluation standards** that identify weaknesses, gaps, and areas where the report fails to meet professional

When evaluating each factor, strictly adhere to ## Evaluation Guidelines and use the decomposed criteria from the expert core criteria.

## **MANDATORY JUSTIFICATION FORMAT (Two-Step Rule):**
Each score justification must follow this exact two-step structure:

1. **Problem description** — A concise, factual observation for the specific checklist factor that clearly reflects the assigned score level. This must directly reference the factor’s focus (e.g., inclusion list completeness, priority depth, section structure, etc.).

2. **Scoring reference** — Immediately after the problem description, write:  
   `thus per [3-1 or 3-2] [Requirements or Quality] Criteria: '[verbatim scoring guideline text for the assigned score range]'`  
   - **3-1 Requirements Criteria** = Use the exact text from the *Requirements Completeness* Scoring Guidelines.  
   - **3-2 Quality Criteria** = Use the exact text from the *Quality Perspective* Scoring Guidelines.  
   - Copy the guideline text **exactly** — no paraphrasing or omission. Preserve punctuation, number ranges, and wording.

**Exact format to use:**
"[Problem description], thus per [3-1 or 3-2] [Requirements or Quality] Criteria: '[verbatim scoring guideline text]' = [Level name]"

Example:
- "Only 1-2 paragraphs per section instead of required 3-5, thus per 3.2 Quality Criteria: 'Significant deficiencies in volume' = Poor level"
- "Document only 2 pages total vs. required 5000+ words, thus per 3.2 Quality Criteria: 'Fails to meet basic professional standards' = Poor level"

## **Consistency Requirement (Mandatory):**
The problem description, quoted guideline text, and assigned score **must** represent the same severity level, and the score must be the natural result of the justification — not predetermined.

1. Problem Description Requirement
- The problem description must provide concrete, objective observations that directly support the assigned severity level and align with the corresponding evaluation criteria.
- The score must be derived naturally from this justification, not chosen first.

2. Positive-Only Restriction
- If the problem description indicates no omissions, gaps, or deficiencies, you MUST assign a Perfect (9–10) score and quote the Perfect-level guideline text.
- No positive-only descriptions are allowed for scores below Perfect; deficiencies must be explicitly described in the problem description.
- If an Excellent (7–8) or lower guideline is quoted, the problem description MUST explicitly state the specific omissions, gaps, or deficiencies that justify that score range.

3. Severity Alignment
- The problem description, quoted guideline text, and assigned score must all reflect the same severity level without contradiction.

4. Mandatory Final Self-Check (Hard Rule)
Before producing the final JSON output, you MUST explicitly review **every factor** for this rule:
    a) If any factor has a score below 9 and no deficiency in the problem description:
        - Add a concise deficiency that matches the assigned score range, OR
        - Raise the score to Perfect (9–10) and update the guideline text accordingly.
    b) Do not produce the final JSON until all such cases are corrected.

5. Scope of Application
- This rule applies equally to both 3-1 Requirements and 3-2 Quality evaluations.

This ensures every score is directly tied to the defined criteria, preventing arbitrary scoring.

---

**Summary Scores:**
```json
{
  "scores": {
    "structure_consistency": {
      "structure_intro": {
        "1": {
          "R1-1": {
            "description": "The introduction includes the subject and problem but lacks clear presentation of the report's significance, and provides insufficient background and motivation for reader understanding, thus per Requirements Criteria: 'More than half met; most core requirements satisfied with some minor elements missing' = Good level",
            "score": 6
          },
          "Q1-1": {
            "description": "Introduction is approximately 150 words, falling short of the recommended 300 words for professional reports, with limited coverage of key components needed to properly set up the report, thus per Quality Criteria: 'Noticeable deficiencies in multiple aspects; requires significant revision — comparable to an undergraduate-level academic paper or an entry-level professional report' = Inadequate level",
            "score": 3
          },
          "Q1-2": {
            "description": "The problem statement uses generalized language without specific details, and the significance section contains ambiguous wording that reduces clarity, thus per Quality Criteria: 'Meets essential professional standards; clear structure and competent analysis but with notable areas for improvement — comparable to a well-executed graduate-level academic paper or standard professional report' = Good level",
            "score": 6
          },
          "Q1-3": {
            "description": "The introduction presents subject, problem, and significance in a logical sequence with clear transitions, making it easy for readers to follow the overall direction of the report, thus per Quality Criteria: 'Exceptional quality across all dimensions; exceeds professional standards — comparable to top-tier academic publication or exemplary professional report' = Perfect level",
            "score": 10
          }
        },
        "2": {
          "R2-1": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q2-1": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"}
        },
        "3": {
          "R3-1": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "R3-2": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "R3-3": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q3-1": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"}
        },
        "4": {
          "R4-1": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"}
        }
      },
      "structure_body": {
        "1": {
          "R1-1": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "R1-2": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q1-1": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q1-2": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"}
        },
        "2": {
          "R2-1": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "R2-2": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q2-1": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q2-2": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"}
        }
      },
      "structure_conclusion": {
        "1": {
          "Q1-1": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q1-2": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"}
        },
        "2": {
          "R2-1": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q2-1": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"}
        }
      },
      "structure_section": {
        "1": {
          "R1-1": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q1-1": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q1-2": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"}
        },
        "2": {
          "R2-1": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "R2-2": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "R2-3": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "R2-4": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q2-1": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"}
        },
        "3": {
          "R3-1": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "R3-2": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "R3-3": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q3-1": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"}
        }
      }
    }
  }
}
```
"""
  guidance=guidance.replace("{core_criteria}", core_criteria)
  return (
  f"[Evaluation: Comprehensive Report Review]\n\n"
  f"[User Query]\n\n{query}\n\n\n"
  f"[Expert Report]\n\n{doc}\n\n\n"
  f"[Guidelines]\n\n{guidance}"
  )

def get_prompt_naration_style(query: str, doc: str) -> str:
  guidance = """
# How to Evaluate Reports (Guidelines)

## 0. Background: What is a Professional Technical Report?
### 1. Definition and Characteristics of Professional Technical Reports
A professional technical report is a document that provides complete and in-depth analysis of a specific technical topic. This is fundamentally different from blog posts, summaries, or presentation slides.
Professional technical reports must have comprehensive scope. They should cover all important aspects of the topic without omission, providing complete coverage including background, current status, methodology, results, implications, and limitations. Academic depth is also essential. Each topic requires detailed explanations spanning multiple paragraphs, and must include analysis, interpretation, and critical evaluation rather than simple fact listing. Comparative and contrastive analysis with related research must also be included.
Rigorous argumentation is another core characteristic of professional technical reports. Sufficient evidence must be provided for all claims, and logical connections and causal relationships must be clearly explained. Opposing views or limitations must also be honestly acknowledged and addressed. Finally, expert-level completeness is required. The report should be fully comprehensible to field experts without additional materials, and must provide concrete, actionable information for practical application. Insights into future research or development directions should also be included.
Professional technical reports aim for complete understanding and practical application, written at substantial length (5,000+ words) through very deep, multi-layered analysis. They follow strict academic structure and target experts and decision-makers. This is fundamentally different from summaries or outlines. Summaries or outlines cover only key points superficially for quick information delivery, using short length (500-1,500 words) with bullet-point structures targeting general audiences and managers, which has completely different characteristics.

### 2. Important Mindset Shift for AI Evaluators
AI evaluators must recognize that general "good writing" standards differ from professional technical report standards. Generally, there is a tendency to consider conciseness as good, prefer what is immediately comprehensible, think that summarizing key points is sufficient, and judge that organizing with tables and lists appears professional. However, these are completely wrong standards for evaluating professional technical reports.
In professional technical reports, more relevant detail is better. They must provide complete understanding and cover all aspects without omission. They should be composed of sufficient paragraphs and argumentation, and this is the correct evaluation standard for professional technical reports. AI evaluators must thoroughly understand this mindset shift and approach evaluation accordingly.

## Overview
Evaluate the completed final report across the specified dimensions and detailed criteria.
Provide systematic evaluation reasoning and assign scores for **every checklist item**. Score = integer 0-10.
For any checklist item where the completed final report provides no assessable material, enter "N/A" instead of a numeric score.

---

## Evaluation Checklist Items

# Complete Checklist Decomposition with Full Details - Requirements vs Quality

## 4. Narration & Style

### 4.1 Report Form (report_form)

#### Element 1: "Does the document have the core structure of a professional report or journal paper, with a full Introduction, Body, and Conclusion?"

**Requirements:**
* R1-1: Document contains Introduction, Body, and Conclusion as distinct sections
* R1-2: Body consists of at least two major sections

**Quality:**
* Q1-1: Introduction ≥300 words, Body ≥3000 words in total, Conclusion ≥200 words.

#### Element 2: "Does the document apply professional section and paragraph formatting?"

**Requirements:**
* R2-1: Section and paragraph formatting (headings, spacing, alignment, etc.) must follow a professional report style, with a section hierarchy that is visually clear and consistently applied.
* R2-2: Core ideas must be expressed in complete sentences. In other words, lists and tables should only serve a supporting role within paragraphs, and they must never exceed half of a paragraph’s total content.

#### Element 3: "Are lists, numbering, and tables formatted in accordance with professional report standards?"

**Requirements:**
* R3-1: Lists and tables (including bullet/numbering styles, nesting, borders, alignment, and captions) must conform to professional report standards.
* R3-2: Lists and tables must be applied consistently throughout the document to ensure a uniform appearance.

### 4.2 Writing Quality (writing_quality)

#### Element 1: "Does each sentence convey a single clear idea?"

**Requirements:**
* R1-1: Each sentence must convey a clear and focused idea, while avoiding unnecessary complexity and maintaining a professional structure and logical flow.

#### Element 2: "Are specific verbs and nouns preferred, with vague or excessive modifiers kept to a minimum?"

**Requirements:**
* R2-1: Are specific verbs and nouns preferred, with vague or excessive modifiers kept to a minimum?

#### Element 3: "Are technical terms defined at first use and then applied consistently thereafter?"

**Requirements:**
* R3-1: Technical or field-specific terms are clearly defined at first use
* R3-2: Technical terms are applied consistently thereafter

#### Element 4: "Is the tone professional, analytical, and dispassionate, avoiding personal value judgments?"

**Requirements:**
* R4-1: The tone must be professional, reflecting an analytical and objective stance while avoiding personal value judgments. (Interpretive or critical perspectives are acceptable only where permitted by disciplinary norms.)

### 4.3 Paragraph (paragraph) 

#### Element 1: "Do paragraphs follow a professional structure with a clear opening, supporting evidence, and a proper closing or transition, and maintain narrative context when using lists or tables?"

**Requirements:**
* R1-1: Paragraphs generally open with a clear topic sentence, especially in analytical or argumentative sections, while descriptive sections (e.g., methods, data presentation) may begin with context-setting statements.
* R1-2: Topic sentences are typically supported by concrete evidence, examples, or further elaboration, ensuring that claims are substantiated in a manner appropriate to the disciplinary context.

**Quality:**
* Q1-1: Substantial paragraphs (typically 4–8 sentences)
* Q1-2: When lists are used, narrative context is maintained with introductory or concluding sentences

#### Element 2: "When charts or tables appear within a paragraph, does the surrounding text explain the key takeaway and provide a clear logical connection to the narrative?"

**Requirements:**
* R2-1: The surrounding text includes explanatory sentences for charts and tables.

**Quality:**
* Q2-1: Figures (charts and tables) and their interpretation are logically integrated into the document’s argument, providing valid support for the claims or explanations.

#### Element 3: "Does each paragraph add new information without unnecessary repetition across paragraphs?"

**Requirements:**
* R3-1: No unnecessary repetition or duplication of content; intentional reiteration for emphasis, coherence, or structural purposes (e.g., introduction, conclusion) is acceptable.

**Quality:**
* Q3-1: Paragraphs connect smoothly without redundancy, ensuring a continuous and consistent flow of information

### 4.4 Reader Friendliness (reader_friendliness)

#### Element 1: "Are subheadings, visual cues, and the placement of tables/figures effectively used to guide the reader and support the narrative flow?"

**Requirements:**
* R1-1: Subheadings are appropriately structured, phrased, and positioned so that the reader can clearly understand the document’s organization and follow the argument.
* R1-2: Visual cues (e.g., bullet points, icons, highlights) are used to help the reader identify and process key ideas.
* R1-3: Tables and figures are introduced and positioned in ways that make the narrative easier for the reader to understand and follow.

#### Element 2: "Are complex concepts explained in a way the intended audience can understand, supported by concrete examples, analogies, summaries, or other clarifying aids?"

**Requirements:**
* R2-1: Complex concepts are explained using at least one clarifying method (example, analogy, summary, etc.)

**Quality:**
* Q2-1: Clarifying aids enhance clarity and support the reader’s understanding.

---

## Evaluation Guidelines (Important)
### 1. AI Evaluator Bias Prevention
**Positive bias tendencies must be avoided. To achieve this:**
1. **Evaluate strictly based on objective, observable formatting and structural criteria.**


### 2. Two Evaluation Scoring Criteria
Each checklist item is evaluated from two perspectives: **Requirements Completeness** and **Requirements Quality**.
Scores are assigned **independently** for each perspective.

#### 2-1. Requirements Completeness Perspective (Completeness)
Evaluates whether the report addresses **all requirements in the checklist without omission**.

* Assess the entire report, not just the well-executed parts
* Confirm that all specified elements are present
* Evaluate using the **Expert Core Criteria**; do not make arbitrary judgments

**Scoring Guidelines:**

* **9–10** (*Perfect*): All requirements fully met; no gaps; no revisions needed
* **7–8** (*Excellent*): Nearly all requirements met; only 1–2 minor omissions with minimal impact
* **5–6** (*Good*): More than half met; most core requirements satisfied, minor elements missing
* **3–4** (*Inadequate*): Some met; multiple gaps in core requirements
* **1–2** (*Poor*): Most requirements missing or addressed only superficially

#### 2-2. Quality Perspective (Adequacy)
Evaluates **how well** the fulfilled requirements are addressed against professional report or academic paper standards.

* Review all relevant quality aspects specified or implied in the checklist item — for example, depth, logic, volume, analytical rigor, precision, comprehensiveness, clarity, accuracy, neutrality, fairness, balance, methodological soundness, and any other factors directly tied to the item’s requirements
* Check all relevant units (examples, sub-elements, sections, paragraphs)
* Base the score on the weakest part only when the weakness is valid and materially affects the criteria. Do not apply mechanical or inappropriate interpretations that introduce unfounded or forced deficiencies.
* Evaluate using the **Expert Core Criteria**; do not make arbitrary judgments

**Scoring Guidelines:**

* **9–10** (*Perfect*): Exceptional quality in all relevant aspects; no revisions needed — *comparable to a top-tier international academic journal or, in certain technical/industry contexts, a best-in-class professional report that meets or exceeds such standards*
* **7–8** (*Excellent*): High quality; meets most academic and professional standards with only minor improvements possible — *comparable to a solid peer-reviewed academic journal, strong PhD-level work, or a high-quality industry report*
* **5–6** (*Good*): Meets essential professional standards; clear structure and competent analysis but with notable areas for improvement — *comparable to a well-executed graduate-level academic paper or standard professional report*
* **3–4** (*Inadequate*): Noticeable deficiencies in multiple aspects; requires significant revision — *comparable to an undergraduate-level academic paper or an entry-level professional report*
* **1–2** (*Poor*): Fails to meet basic professional standards; insufficient depth, rigor, or precision — *below undergraduate level; unsuitable for publication or professional use*

---

## Output Format
The output format evaluates each checklist element by decomposing it into specific Requirements and Quality factors, scoring each factor individually (R1-1, R1-2, Q1-1, Q1-2, etc.), then outputs all detailed scores in JSON format.

For each checklist item:
1. Evaluate all Requirements factors (R) using **2-1 Requirements Completeness** Scoring Guidelines.
2. Evaluate all Quality factors (Q) using **2-2 Quality Perspective** Scoring Guidelines.  
3. Provide individual scores and **critical analysis/justification** for each factor, following the MANDATORY JUSTIFICATION FORMAT (Two-Step Rule) and quoting the exact text from the corresponding Scoring Guidelines.  

## **MANDATORY JUSTIFICATION FORMAT (Two-Step Rule):**
Each score justification must follow this exact two-step structure:

1. **Problem description** — A concise, factual observation for the specific checklist factor that clearly reflects the assigned score level. This must directly reference the factor’s focus (e.g., inclusion list completeness, priority depth, section structure, etc.).

2. **Scoring reference** — Immediately after the problem description, write:  
   `thus per [2-1 or 2-2] [Requirements or Quality] Criteria: '[verbatim scoring guideline text for the assigned score range]'`  
   - **2-1 Requirements Criteria** = Use the exact text from the *Requirements Completeness* Scoring Guidelines.  
   - **2-2 Quality Criteria** = Use the exact text from the *Quality Perspective* Scoring Guidelines.  
   - Copy the guideline text **exactly** — no paraphrasing or omission. Preserve punctuation, number ranges, and wording.

**Exact format to use:**
"[Problem description], thus per [2-1 or 2-2] [Requirements or Quality] Criteria: '[verbatim scoring guideline text for the assigned score range]' = [Level name]"

Example:
- "Only 1-2 paragraphs per section instead of required 3-5, thus per 2-2 Quality Criteria: 'Fails to meet basic professional standards; insufficient depth, rigor, or precision — below undergraduate level; unsuitable for publication or professional use' = Poor level"
- "Document only 2 pages total vs. required 5000+ words, thus per 2-2 Quality Criteria: 'Fails to meet basic professional standards; insufficient depth, rigor, or precision — below undergraduate level; unsuitable for publication or professional use' = Poor level"

## **Consistency Requirement (Mandatory):**
The problem description, quoted guideline text, and assigned score **must** represent the same severity level, and the score must be the natural result of the justification — not predetermined.

1. Problem Description Requirement
- The problem description must provide concrete, objective observations that directly support the assigned severity level and align with the corresponding evaluation criteria.
- The score must be derived naturally from this justification, not chosen first.

2. Positive-Only Restriction
- If the problem description indicates no omissions, gaps, or deficiencies, you MUST assign a Perfect (9–10) score and quote the Perfect-level guideline text.
- No positive-only descriptions are allowed for scores below Perfect; deficiencies must be explicitly described in the problem description.
- If an Excellent (7–8) or lower guideline is quoted, the problem description MUST explicitly state the specific omissions, gaps, or deficiencies that justify that score range.

3. Severity Alignment
- The problem description, quoted guideline text, and assigned score must all reflect the same severity level without contradiction.

4. Mandatory Final Self-Check (Hard Rule)
Before producing the final JSON output, you MUST explicitly review **every factor** for this rule:
    a) If any factor has a score below 9 and no deficiency in the problem description:
        - Add a concise deficiency that matches the assigned score range, OR
        - Raise the score to Perfect (9–10) and update the guideline text accordingly.
    b) Do not produce the final JSON until all such cases are corrected.

5. Scope of Application
- This rule applies equally to both 2-1 Requirements and 2-2 Quality evaluations.



This ensures every score is directly tied to the defined criteria, preventing arbitrary scoring.

---

**Summary Scores:**
```json
{
  "scores": {
    "narration_style": {
      "report_form": {
        "1": {
          "R1-1": {
            "description": "The document contains all three core sections (Introduction, Body, Conclusion) as distinct sections without omission, thus per Requirements Criteria: 'All requirements fully met; no gaps; no revisions needed' = Perfect level",
            "score": 10
          },
          "R1-2": {
            "description": "The Body consists of two major sections as required, but one section is underdeveloped with insufficient analytical depth, thus per Requirements Criteria: 'Nearly all requirements met; only 1–2 minor omissions with minimal impact' = Excellent level",
            "score": 8
          },
          "Q1-1": {
            "description": "Introduction is 650 words (≥300) and Conclusion is 250 words (≥200), meeting standards, but Body is 2800 words, falling slightly short of the 3000-word threshold, thus per Quality Criteria: 'High quality across most dimensions; minor refinements possible — comparable to a strong doctoral-level paper or high-quality professional report' = Excellent level",
            "score": 8
          }
        },
        "2": {
          "R2-1": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "R2-2": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"}
        },
        "3": {
          "R3-1": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "R3-2": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"}
        }
      },
      "writing_quality": {
        "1": {
          "R1-1": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"}
        },
        "2": {
          "R2-1": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"}
        },
        "3": {
          "R3-1": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "R3-2": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"}
        },
        "4": {
          "R4-1": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"}
        }
      },
      "paragraph": {
        "1": {
          "R1-1": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "R1-2": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q1-1": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q1-2": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"}
        },
        "2": {
          "R2-1": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q2-1": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"}
        },
        "3": {
          "R3-1": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q3-1": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"}
        }
      },
      "reader_friendliness": {
        "1": {
          "R1-1": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "R1-2": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "R1-3": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"}
        },
        "2": {
          "R2-1": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q2-1": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"}
        }
      }
    }
  }
}
```
"""
  return (
  f"[Evaluation: Comprehensive Report Review]\n\n"
  f"[User Query]\n\n{query}\n\n\n"
  f"[Expert Report]\n\n{doc}\n\n\n"
  f"[Guidelines]\n\n{guidance}"
  )




# def get_prompt_ethics_information(query: str, doc: str, core_criteria: str) -> str:
#   guidance = """
# # How to Evaluate Reports (Guidelines)

# ## 0. Background: What is a Professional Technical Report?
# ### 1. Definition and Characteristics of Professional Technical Reports
# A professional technical report is a document that provides complete and in-depth analysis of a specific technical topic. This is fundamentally different from blog posts, summaries, or presentation slides.
# Professional technical reports must have comprehensive scope. They should cover all important aspects of the topic without omission, providing complete coverage including background, current status, methodology, results, implications, and limitations. Academic depth is also essential. Each topic requires detailed explanations spanning multiple paragraphs, and must include analysis, interpretation, and critical evaluation rather than simple fact listing. Comparative and contrastive analysis with related research must also be included.
# Rigorous argumentation is another core characteristic of professional technical reports. Sufficient evidence must be provided for all claims, and logical connections and causal relationships must be clearly explained. Opposing views or limitations must also be honestly acknowledged and addressed. Finally, expert-level completeness is required. The report should be fully comprehensible to field experts without additional materials, and must provide concrete, actionable information for practical application. Insights into future research or development directions should also be included.
# Professional technical reports aim for complete understanding and practical application, written at substantial length (5,000+ words) through very deep, multi-layered analysis. They follow strict academic structure and target experts and decision-makers. This is fundamentally different from summaries or outlines. Summaries or outlines cover only key points superficially for quick information delivery, using short length (500-1,500 words) with bullet-point structures targeting general audiences and managers, which has completely different characteristics.

# ### 2. Important Mindset Shift for AI Evaluators
# AI evaluators must recognize that general "good writing" standards differ from professional technical report standards. Generally, there is a tendency to consider conciseness as good, prefer what is immediately comprehensible, think that summarizing key points is sufficient, and judge that organizing with tables and lists appears professional. However, these are completely wrong standards for evaluating professional technical reports.
# In professional technical reports, more relevant detail is better. They must provide complete understanding and cover all aspects without omission. They should be composed of sufficient paragraphs and argumentation, and this is the correct evaluation standard for professional technical reports. AI evaluators must thoroughly understand this mindset shift and approach evaluation accordingly.

# ## Overview
# Evaluate the completed final report across the specified dimensions and detailed criteria.
# Provide systematic evaluation reasoning and assign scores for **every checklist item**. Score = integer 0-10.
# If the final report provides no assessable content for a checklist item, enter “N/A” instead of a numeric score. In addition, if the type of report does not require that evaluation criterion to be applied, it should also be marked as N/A.

# ---

# ## Evaluation Checklist Items

# # Complete Checklist Decomposition with Full Details - Requirements vs Quality

# ## 5. Information Integrity

# ### 5.1 Information Recency (information_recency)

# #### Element 1: ""When evidence comes from regularly updated fields (e.g., market statistics, policy reports, software APIs), the report must use the most recent edition and specify the publication year or version in citations. If older material is cited, the report must explain differences from current conditions and discuss the applicability limits. (Required when the report relies on regularly updated sources; otherwise N/A.)"

# **Requirements:**
# * R1-1: The most recent edition is used for regularly updated fields.
# * R1-2: The publication year or version must be explicitly indicated in the text (e.g., in citations or captions).
# * R1-3: If the most recent edition is not used, the reason must be explicitly justified.

# ## 6. Ethics & Compliance

# ### 6.1 Sensitive Issues (sensitive_issues)

# #### Element 1: "Are sensitive topics—politics, race, gender, etc.—handled in a **neutral and fair** manner with methodological precision and analytical rigor throughout any discussions of such topics?"

# **Requirements:**
# * R1-1: Across the entire document, sensitive topics are handled with an explicit intent of neutrality
# * R1-2: Across the entire document, sensitive topics are handled with an explicit intent of fairness

# **Quality:**
# * Q1-1:[Scope: Sensitive topics only] Language and tone are impartial, avoiding emotionally charged, biased, or value-laden wording in all sensitive topic discussions
# * Q1-2: All relevant perspectives on the sensitive topic are given balanced attention, with comparable detail, background, and context provided for each side

# #### Element 2: "Is the content free of bias, categorical statements, disparagement, or over-generalization, maintaining comprehensive objectivity in all assertions and conclusions?"

# **Requirements:**
# * R2-1: Language is professional and balanced, avoiding unreasonable categorical, exaggerated, or disparaging expressions
# * R2-2: Claims and generalizations are grounded in evidence and not extended beyond its scope.

# ### 6.2 Safety & Impact (safety_impact)

# #### Element 1: "Are the potential impacts of proposed policies, technologies, strategies, or research outcomes sufficiently considered? Does the analysis include key implications, possible side-effects, and interpretations from multiple perspectives?
# (This element is mandatory to consider when such aspects are essential to the report.)"

# **Requirements:**
# * R1-1: Potential side-effects or limitations are discussed (where essential)
# * R1-2: Multiple stakeholder perspectives or contextual viewpoints are included (where essential)

# **Quality:**
# * Q1-1: Comprehensiveness – Key implications are presented in a balanced way, and relevant contexts are sufficiently considered (where essential)
# * Q1-2: Depth & Evidence – Each identified impact is analyzed with adequate detail, supported by data, evidence, or clear reasoning (where essential)

# #### Element 2: "Does the report avoid providing concrete methods or procedures that could enable or facilitate illegal or harmful activities?"

# **Requirements:**
# * R2-1: Concrete methods or procedures that could enable or facilitate illegal or harmful activities are excluded.

# ### 6.3 Perspective Balance (perspective_balance)

# #### Element 1: "Does the information selection avoid reflecting only one viewpoint or excluding opposing positions?"

# **Requirements:**
# * R1-1: Information selection avoids single-viewpoint bias and incorporates multiple perspectives, including opposing views where applicable.

# **Quality:**
# * Q1-1: Relevant and valid perspectives, including opposing views, should be selected and used to strengthen the argument by addressing counterpoints in a balanced way, thereby enhancing the report’s credibility.

# ---

# ## Evaluation Guidelines (Important)
# ### 1. AI Evaluator Bias Prevention
# **Positive bias tendencies must be avoided. To achieve this:**
# 1. **Evaluate strictly based on checklist sub-criteria and expert core criteria. Do not make arbitrary judgments based on your own criteria rather than the expert core criteria.**


# ### 2. Prohibition of Model's Arbitrary Evaluation: Evaluate Based on Expert Core Criteria
# Each checklist evaluation must be based on expert core criteria, not the evaluator's arbitrary standards.
# Expert core criteria represent essential content, conditions, and requirements that must be addressed in high-quality reports.
# When evaluating each checklist, prioritize reference to the expert core criteria below. **Especially when content judgment is difficult, use expert core criteria as the primary reference standard.**

# ###Expert Core Criteria START
# {core_criteria}
# ###Expert Core Criteria END

# ### 3. Two Evaluation Scoring Criteria
# Each checklist item is evaluated from two perspectives: **Requirements Completeness** and **Requirements Quality**.
# Scores are assigned **independently** for each perspective.

# #### 3-1. Requirements Completeness Perspective (Completeness)
# Evaluates whether the report addresses **all requirements in the checklist without omission**.

# * Assess the entire report, not just the well-executed parts
# * Confirm that all specified elements are present
# * Evaluate using the **Expert Core Criteria**; do not make arbitrary judgments

# **Scoring Guidelines:**

# * **9–10** (*Perfect*): All requirements fully met; no gaps; no revisions needed
# * **7–8** (*Excellent*): Nearly all requirements met; only 1–2 minor omissions with minimal impact
# * **5–6** (*Good*): More than half met; most core requirements satisfied, minor elements missing
# * **3–4** (*Inadequate*): Some met; multiple gaps in core requirements
# * **1–2** (*Poor*): Most requirements missing or addressed only superficially

# #### 3-2. Quality Perspective (Adequacy)
# Evaluates **how well** the fulfilled requirements are addressed against professional report or academic paper standards.

# * Review all relevant quality aspects specified or implied in the checklist item — for example, depth, logic, volume, analytical rigor, precision, comprehensiveness, clarity, accuracy, neutrality, fairness, balance, methodological soundness, and any other factors directly tied to the item’s requirements
# * Check all relevant units (examples, sub-elements, sections, paragraphs)
# * Base the score on the weakest part only when the weakness is valid and materially affects the criteria. Do not apply mechanical or inappropriate interpretations that introduce unfounded or forced deficiencies.
# * Evaluate using the **Expert Core Criteria**; do not make arbitrary judgments

# **Scoring Guidelines:**

# * **9–10** (*Perfect*): Exceptional quality in all relevant aspects; no revisions needed — *comparable to a top-tier international academic journal or, in certain technical/industry contexts, a best-in-class professional report that meets or exceeds such standards*
# * **7–8** (*Excellent*): High quality; meets most academic and professional standards with only minor improvements possible — *comparable to a solid peer-reviewed academic journal, strong PhD-level work, or a high-quality industry report*
# * **5–6** (*Good*): Meets essential professional standards; clear structure and competent analysis but with notable areas for improvement — *comparable to a well-executed graduate-level academic paper or standard professional report*
# * **3–4** (*Inadequate*): Noticeable deficiencies in multiple aspects; requires significant revision — *comparable to an undergraduate-level academic paper or an entry-level professional report*
# * **1–2** (*Poor*): Fails to meet basic professional standards; insufficient depth, rigor, or precision — *below undergraduate level; unsuitable for publication or professional use*


# ---

# ## Output Format
# The output format evaluates each checklist element by decomposing it into specific Requirements and Quality factors, scoring each factor individually (R1-1, R1-2, Q1-1, Q1-2, etc.), then outputs all detailed scores in JSON format.

# For each checklist item:
# 1. Evaluate all Requirements factors (R) using **3-1 Requirements Completeness** Scoring Guidelines.
# 2. Evaluate all Quality factors (Q) using **3-2 Quality Perspective** Scoring Guidelines.  
# 3. Provide individual scores and **critical analysis/justification** for each factor, following the MANDATORY JUSTIFICATION FORMAT (Two-Step Rule) and quoting the exact text from the corresponding Scoring Guidelines (3-1 for Requirements, 3-2 for Quality).  
# 4. Apply **critical evaluation standards** that identify weaknesses, gaps, and areas where the report fails to meet professional

# When evaluating each factor, strictly adhere to ## Evaluation Guidelines and use the decomposed criteria from the expert core criteria.

# ## **MANDATORY JUSTIFICATION FORMAT (Two-Step Rule):**
# Each score justification must follow this exact two-step structure:

# 1. **Problem description** — A concise, factual observation for the specific checklist factor that clearly reflects the assigned score level. This must directly reference the factor’s focus (e.g., inclusion list completeness, priority depth, section structure, etc.).

# 2. **Scoring reference** — Immediately after the problem description, write:  
#    `thus per [3-1 or 3-2] [Requirements or Quality] Criteria: '[verbatim scoring guideline text for the assigned score range]'`  
#    - **3-1 Requirements Criteria** = Use the exact text from the *Requirements Completeness* Scoring Guidelines.  
#    - **3-2 Quality Criteria** = Use the exact text from the *Quality Perspective* Scoring Guidelines.  
#    - Copy the guideline text **exactly** — no paraphrasing or omission. Preserve punctuation, number ranges, and wording.

# **Exact format to use:**
# "[Problem description], thus per [3-1 or 3-2] [Requirements or Quality] Criteria: '[verbatim scoring guideline text]' = [Level name]"

# Example:
# - "Only 1-2 paragraphs per section instead of required 3-5, thus per 3.2 Quality Criteria: 'Significant deficiencies in volume' = Poor level"
# - "Document only 2 pages total vs. required 5000+ words, thus per 3.2 Quality Criteria: 'Fails to meet basic professional standards' = Poor level"

# ## **Consistency Requirement (Mandatory):**
# The problem description, quoted guideline text, and assigned score **must** represent the same severity level, and the score must be the natural result of the justification — not predetermined.

# 1. Problem Description Requirement
# - The problem description must provide concrete, objective observations that directly support the assigned severity level and align with the corresponding evaluation criteria.
# - The score must be derived naturally from this justification, not chosen first.

# 2. Positive-Only Restriction
# - If the problem description indicates no omissions, gaps, or deficiencies, you MUST assign a Perfect (9–10) score and quote the Perfect-level guideline text.
# - No positive-only descriptions are allowed for scores below Perfect; deficiencies must be explicitly described in the problem description.
# - If an Excellent (7–8) or lower guideline is quoted, the problem description MUST explicitly state the specific omissions, gaps, or deficiencies that justify that score range.

# 3. Severity Alignment
# - The problem description, quoted guideline text, and assigned score must all reflect the same severity level without contradiction.

# 4. Mandatory Final Self-Check (Hard Rule)
# Before producing the final JSON output, you MUST explicitly review **every factor** for this rule:
#     a) If any factor has a score below 9 and no deficiency in the problem description:
#         - Add a concise deficiency that matches the assigned score range, OR
#         - Raise the score to Perfect (9–10) and update the guideline text accordingly.
#     b) Do not produce the final JSON until all such cases are corrected.

# 5. Scope of Application
# - This rule applies equally to both 3-1 Requirements and 3-2 Quality evaluations.

# This ensures every score is directly tied to the defined criteria, preventing arbitrary scoring.

# ---

# **Summary Scores:**
# ```json
# {
#   "scores": {
#     "information_integrity": {
#       "information_recency": {
#         "1": {
#           "R1-1": ["The report uses the most recent 2023 market dataset, thus per 3-1 Requirements Criteria: 'All requirements fully met; no gaps; no revisions needed' = Perfect level", 10],
#           "R1-2": ["Publication year is specified, but version numbers for APIs are missing, thus per 3-1 Requirements Criteria: 'Nearly all requirements met; only 1–2 minor omissions with minimal impact' = Excellent level", 8],
#           "R1-3": ["Some citations lack precise publication dates, reducing traceability, thus per 3-1 Requirements Criteria: 'More than half met; most core requirements satisfied, minor elements missing' = Good level", 6],
#         }
#       }
#     },
#     "ethics_compliance": {
#       "sensitive_issues": {
#         "1": { ... },
#         "2": { ... }
#       },
#       "safety_impact": {
#         "1": { ... },
#         "2": { ... }
#       },
#       "perspective_balance": {
#         "1": { ... }
#       }
#     }
#   }
# }
# ```
# """
#   guidance=guidance.replace("{core_criteria}", core_criteria)
#   return (
#   f"[Evaluation: Comprehensive Report Review]\n\n"
#   f"[User Query]\n\n{query}\n\n\n"
#   f"[Expert Report]\n\n{doc}\n\n\n"
#   f"[Guidelines]\n\n{guidance}"
#   )
def get_prompt_ethics_information(query: str, doc: str, core_criteria: str) -> str:
  guidance = """
# How to Evaluate Reports (Guidelines)

## 0. Background: What is a Professional Technical Report?
### 1. Definition and Characteristics of Professional Technical Reports
A professional technical report is a document that provides complete and in-depth analysis of a specific technical topic. This is fundamentally different from blog posts, summaries, or presentation slides.
Professional technical reports must have comprehensive scope. They should cover all important aspects of the topic without omission, providing complete coverage including background, current status, methodology, results, implications, and limitations. Academic depth is also essential. Each topic requires detailed explanations spanning multiple paragraphs, and must include analysis, interpretation, and critical evaluation rather than simple fact listing. Comparative and contrastive analysis with related research must also be included.
Rigorous argumentation is another core characteristic of professional technical reports. Sufficient evidence must be provided for all claims, and logical connections and causal relationships must be clearly explained. Opposing views or limitations must also be honestly acknowledged and addressed. Finally, expert-level completeness is required. The report should be fully comprehensible to field experts without additional materials, and must provide concrete, actionable information for practical application. Insights into future research or development directions should also be included.
Professional technical reports aim for complete understanding and practical application, written at substantial length (5,000+ words) through very deep, multi-layered analysis. They follow strict academic structure and target experts and decision-makers. This is fundamentally different from summaries or outlines. Summaries or outlines cover only key points superficially for quick information delivery, using short length (500-1,500 words) with bullet-point structures targeting general audiences and managers, which has completely different characteristics.

### 2. Important Mindset Shift for AI Evaluators
AI evaluators must recognize that general "good writing" standards differ from professional technical report standards. Generally, there is a tendency to consider conciseness as good, prefer what is immediately comprehensible, think that summarizing key points is sufficient, and judge that organizing with tables and lists appears professional. However, these are completely wrong standards for evaluating professional technical reports.
In professional technical reports, more relevant detail is better. They must provide complete understanding and cover all aspects without omission. They should be composed of sufficient paragraphs and argumentation, and this is the correct evaluation standard for professional technical reports. AI evaluators must thoroughly understand this mindset shift and approach evaluation accordingly.

## Overview
Evaluate the completed final report across the specified dimensions and detailed criteria.
Provide systematic evaluation reasoning and assign scores for **every checklist item**. Score = integer 0-10.
For any checklist item where the completed final report provides no assessable material, enter "N/A" instead of a numeric score.

---

## Evaluation Checklist Items

# Complete Checklist Decomposition with Full Details - Requirements vs Quality

## 5. Information Integrity

### 5.1 Information Recency (information_recency)

#### Element 1: ""When evidence comes from regularly updated fields (e.g., market statistics, policy reports, software APIs), the report must use the most recent edition and specify the publication year or version in citations. If older material is cited, the report must explain differences from current conditions and discuss the applicability limits. (Required when the report relies on regularly updated sources; otherwise N/A.)"

**Requirements:**
* R1-1: The most recent edition is used for regularly updated fields.
* R1-2: The publication year or version must be explicitly indicated in the text (e.g., in citations or captions).
* R1-3: If the most recent edition is not used, the reason must be explicitly justified.

## 6. Ethics & Compliance

### 6.1 Sensitive Issues (sensitive_issues)

#### Element 1: "Are sensitive topics—politics, race, gender, etc.—handled in a **neutral and fair** manner with methodological precision and analytical rigor throughout any discussions of such topics?"

**Requirements:**
* R1-1: Across the entire document, sensitive topics are handled with an explicit intent of neutrality
* R1-2: Across the entire document, sensitive topics are handled with an explicit intent of fairness

**Quality:**
* Q1-1:[Scope: Sensitive topics only] Language and tone are impartial, avoiding emotionally charged, biased, or value-laden wording in all sensitive topic discussions
* Q1-2: All relevant perspectives on the sensitive topic are given balanced attention, with comparable detail, background, and context provided for each side

#### Element 2: "Is the content free of bias, categorical statements, disparagement, or over-generalization, maintaining comprehensive objectivity in all assertions and conclusions?"

**Requirements:**
* R2-1: Language is professional and balanced, avoiding unreasonable categorical, exaggerated, or disparaging expressions
* R2-2: Claims and generalizations are grounded in evidence and not extended beyond its scope.

### 6.2 Safety & Impact (safety_impact)

#### Element 1: "Are the potential impacts of proposed policies, technologies, strategies, or research outcomes sufficiently considered? Does the analysis include key implications, possible side-effects, and interpretations from multiple perspectives?
(This element is mandatory to consider when such aspects are essential to the report.)"

**Requirements:**
* R1-1: Potential side-effects or limitations are discussed (where essential)
* R1-2: Multiple stakeholder perspectives or contextual viewpoints are included (where essential)

**Quality:**
* Q1-1: Comprehensiveness – Key implications are presented in a balanced way, and relevant contexts are sufficiently considered (where essential)
* Q1-2: Depth & Evidence – Each identified impact is analyzed with adequate detail, supported by data, evidence, or clear reasoning (where essential)

#### Element 2: "Does the report avoid providing concrete methods or procedures that could enable or facilitate illegal or harmful activities?"

**Requirements:**
* R2-1: Concrete methods or procedures that could enable or facilitate illegal or harmful activities are excluded.

### 6.3 Perspective Balance (perspective_balance)

#### Element 1: "Does the information selection avoid reflecting only one viewpoint or excluding opposing positions?"

**Requirements:**
* R1-1: Information selection avoids single-viewpoint bias and incorporates multiple perspectives, including opposing views where applicable.

**Quality:**
* Q1-1: Relevant and valid perspectives, including opposing views, should be selected and used to strengthen the argument by addressing counterpoints in a balanced way, thereby enhancing the report’s credibility.

---

## Evaluation Guidelines (Important)
### 1. AI Evaluator Bias Prevention
**Positive bias tendencies must be avoided. To achieve this:**
1. **Evaluate strictly based on objective, observable formatting and structural criteria.**


### 2. Two Evaluation Scoring Criteria
Each checklist item is evaluated from two perspectives: **Requirements Completeness** and **Requirements Quality**.
Scores are assigned **independently** for each perspective.

#### 2-1. Requirements Completeness Perspective (Completeness)
Evaluates whether the report addresses **all requirements in the checklist without omission**.

* Assess the entire report, not just the well-executed parts
* Confirm that all specified elements are present
* Evaluate using the **Expert Core Criteria**; do not make arbitrary judgments

**Scoring Guidelines:**

* **9–10** (*Perfect*): All requirements fully met; no gaps; no revisions needed
* **7–8** (*Excellent*): Nearly all requirements met; only 1–2 minor omissions with minimal impact
* **5–6** (*Good*): More than half met; most core requirements satisfied, minor elements missing
* **3–4** (*Inadequate*): Some met; multiple gaps in core requirements
* **1–2** (*Poor*): Most requirements missing or addressed only superficially

#### 2-2. Quality Perspective (Adequacy)
Evaluates **how well** the fulfilled requirements are addressed against professional report or academic paper standards.

* Review all relevant quality aspects specified or implied in the checklist item — for example, depth, logic, volume, analytical rigor, precision, comprehensiveness, clarity, accuracy, neutrality, fairness, balance, methodological soundness, and any other factors directly tied to the item’s requirements
* Check all relevant units (examples, sub-elements, sections, paragraphs)
* Base the score on the weakest part only when the weakness is valid and materially affects the criteria. Do not apply mechanical or inappropriate interpretations that introduce unfounded or forced deficiencies.
* Evaluate using the **Expert Core Criteria**; do not make arbitrary judgments

**Scoring Guidelines:**

* **9–10** (*Perfect*): Exceptional quality in all relevant aspects; no revisions needed — *comparable to a top-tier international academic journal or, in certain technical/industry contexts, a best-in-class professional report that meets or exceeds such standards*
* **7–8** (*Excellent*): High quality; meets most academic and professional standards with only minor improvements possible — *comparable to a solid peer-reviewed academic journal, strong PhD-level work, or a high-quality industry report*
* **5–6** (*Good*): Meets essential professional standards; clear structure and competent analysis but with notable areas for improvement — *comparable to a well-executed graduate-level academic paper or standard professional report*
* **3–4** (*Inadequate*): Noticeable deficiencies in multiple aspects; requires significant revision — *comparable to an undergraduate-level academic paper or an entry-level professional report*
* **1–2** (*Poor*): Fails to meet basic professional standards; insufficient depth, rigor, or precision — *below undergraduate level; unsuitable for publication or professional use*

---

## Output Format
The output format evaluates each checklist element by decomposing it into specific Requirements and Quality factors, scoring each factor individually (R1-1, R1-2, Q1-1, Q1-2, etc.), then outputs all detailed scores in JSON format.

For each checklist item:
1. Evaluate all Requirements factors (R) using **2-1 Requirements Completeness** Scoring Guidelines.
2. Evaluate all Quality factors (Q) using **2-2 Quality Perspective** Scoring Guidelines.  
3. Provide individual scores and **critical analysis/justification** for each factor, following the MANDATORY JUSTIFICATION FORMAT (Two-Step Rule) and quoting the exact text from the corresponding Scoring Guidelines.  

## **MANDATORY JUSTIFICATION FORMAT (Two-Step Rule):**
Each score justification must follow this exact two-step structure:

1. **Problem description** — A concise, factual observation for the specific checklist factor that clearly reflects the assigned score level. This must directly reference the factor’s focus (e.g., inclusion list completeness, priority depth, section structure, etc.).

2. **Scoring reference** — Immediately after the problem description, write:  
   `thus per [2-1 or 2-2] [Requirements or Quality] Criteria: '[verbatim scoring guideline text for the assigned score range]'`  
   - **2-1 Requirements Criteria** = Use the exact text from the *Requirements Completeness* Scoring Guidelines.  
   - **2-2 Quality Criteria** = Use the exact text from the *Quality Perspective* Scoring Guidelines.  
   - Copy the guideline text **exactly** — no paraphrasing or omission. Preserve punctuation, number ranges, and wording.

**Exact format to use:**
"[Problem description], thus per [2-1 or 2-2] [Requirements or Quality] Criteria: '[verbatim scoring guideline text for the assigned score range]' = [Level name]"

Example:
- "Only 1-2 paragraphs per section instead of required 3-5, thus per 2-2 Quality Criteria: 'Fails to meet basic professional standards; insufficient depth, rigor, or precision — below undergraduate level; unsuitable for publication or professional use' = Poor level"
- "Document only 2 pages total vs. required 5000+ words, thus per 2-2 Quality Criteria: 'Fails to meet basic professional standards; insufficient depth, rigor, or precision — below undergraduate level; unsuitable for publication or professional use' = Poor level"

## **Consistency Requirement (Mandatory):**
The problem description, quoted guideline text, and assigned score **must** represent the same severity level, and the score must be the natural result of the justification — not predetermined.

1. Problem Description Requirement
- The problem description must provide concrete, objective observations that directly support the assigned severity level and align with the corresponding evaluation criteria.
- The score must be derived naturally from this justification, not chosen first.

2. Positive-Only Restriction
- If the problem description indicates no omissions, gaps, or deficiencies, you MUST assign a Perfect (9–10) score and quote the Perfect-level guideline text.
- No positive-only descriptions are allowed for scores below Perfect; deficiencies must be explicitly described in the problem description.
- If an Excellent (7–8) or lower guideline is quoted, the problem description MUST explicitly state the specific omissions, gaps, or deficiencies that justify that score range.

3. Severity Alignment
- The problem description, quoted guideline text, and assigned score must all reflect the same severity level without contradiction.

4. Mandatory Final Self-Check (Hard Rule)
Before producing the final JSON output, you MUST explicitly review **every factor** for this rule:
    a) If any factor has a score below 9 and no deficiency in the problem description:
        - Add a concise deficiency that matches the assigned score range, OR
        - Raise the score to Perfect (9–10) and update the guideline text accordingly.
    b) Do not produce the final JSON until all such cases are corrected.

5. Scope of Application
- This rule applies equally to both 2-1 Requirements and 2-2 Quality evaluations.



This ensures every score is directly tied to the defined criteria, preventing arbitrary scoring.

---

**Summary Scores:**
```json
{
  "scores": {
    "information_integrity": {
      "information_recency": {
        "1": {
          "R1-1": {
            "description": "The report uses the most recent 2024 edition for all regularly updated fields including market statistics and policy reports, thus per Requirements Criteria: 'All requirements fully met; no gaps; no revisions needed' = Perfect level",
            "score": 10
          },
          "R1-2": {
            "description": "Publication years are specified in most citations, but 2 out of 8 software API references lack version numbers, thus per Requirements Criteria: 'Nearly all requirements met; only 1–2 minor omissions with minimal impact' = Excellent level",
            "score": 8
          },
          "R1-3": {
            "description": "One section cites a 2022 dataset instead of the 2024 version, but provides explicit justification explaining methodological consistency requirements and discusses applicability limits, thus per Requirements Criteria: 'All requirements fully met; no gaps; no revisions needed' = Perfect level",
            "score": 10
          }
        }
      }
    },
    "ethics_compliance": {
      "sensitive_issues": {
        "1": {
          "R1-1": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "R1-2": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q1-1": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q1-2": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"}
        },
        "2": {
          "R2-1": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "R2-2": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"}
        }
      },
      "safety_impact": {
        "1": {
          "R1-1": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "R1-2": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q1-1": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q1-2": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"}
        },
        "2": {
          "R2-1": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"}
        }
      },
      "perspective_balance": {
        "1": {
          "R1-1": {"description": "Description of issue, thus per Requirements Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"},
          "Q1-1": {"description": "Description of issue, thus per Quality Criteria: 'relevant criterion quote' = Level", "score": "1-10 or N/A"}
        }
      }
    }
  }
}
```
"""
  return (
  f"[Evaluation: Comprehensive Report Review]\n\n"
  f"[User Query]\n\n{query}\n\n\n"
  f"[Expert Report]\n\n{doc}\n\n\n"
  f"[Guidelines]\n\n{guidance}"
  )



