PROMPT_COMPAS = """
You are an expert in explainable AI and criminal justice risk assessment. Your task is to evaluate the quality of a SHAP explanation that describes why a person may be predicted to **recommit a crime**.

Each explanation is a list of features in the following format:
<featureID> <feature_name> : <feature_value> = <feature relevance score>

Your goal is to determine how **reasonable and high-quality** the explanation is, based on the SHAP scores and your domain knowledge.

### Understanding SHAP scores:
- A **positive SHAP score (> 0)** means the feature increases the person’s risk of recidivism, contributing to a higher predicted risk.
- A **negative SHAP score (< 0)** means the feature decreases the risk, contributing to a lower predicted risk.
- A **SHAP score of 0** means the feature has no impact on the prediction.
- The **magnitude** of the SHAP score reflects the **strength of the feature’s influence** on the model's decision — larger absolute values imply greater impact.

### Your task:
Assign a **quality score from 1 to 5**:
- **5:** Excellent explanation — all important features have appropriate SHAP scores, and no suspicious or unjustified values.
- **4:** Good explanation — mostly reasonable, with at most minor issues in some features.
- **3:** Moderate quality — some questionable or poorly aligned SHAP scores, but overall still partially plausible.
- **2:** Poor quality — several features have inappropriate or suspicious SHAP scores.
- **1:** Very low quality — the explanation is clearly flawed, with major issues in multiple key features.

Also, list **AT MOST three feature IDs** whose relevance scores are **unjustified or suspicious**, based on the feature's value and known importance.

Do not consider the model's prediction. Focus only on whether the explanation is plausible and grounded.

### Output format:
<score><space><comma-separated list of incorrect feature IDs>

Examples:
- Excellent explanation: `5`
- Good explanation with minor issues: `4 5`
- Low quality with clear issues: `2 1,6`
- Very low quality with major issues: `1 2,4,7`

If there are no suspicious features, leave the second part empty (just the score). DO NOT include any additional text or explanations in your response.

### Explanation to evaluate:
"""

PROMPT_ADULT = """
You are an expert in explainable AI and socioeconomic modeling. Your task is to evaluate the quality of a SHAP explanation that describes why a person is predicted to **earn more than $50K per year** based on the Adult Income dataset.

Each explanation is a list of features in the following format:
<featureID> <feature_name> : <feature_value> = <feature relevance score>

Your goal is to determine how **reasonable and high-quality** the explanation is, based on the SHAP scores and your domain knowledge of income prediction.

### Understanding SHAP scores:
- A **positive SHAP score (> 0)** means the feature increases the predicted likelihood that the person earns >$50K.
- A **negative SHAP score (< 0)** means the feature decreases the likelihood of earning >$50K.
- A **SHAP score of 0** means the feature has no impact on the prediction.
- The **magnitude** of the SHAP score reflects the **strength of the feature’s influence** — larger absolute values imply greater impact.

### Your task:
Assign a **quality score from 1 to 5**:
- **5:** Excellent explanation — all important features have appropriate SHAP scores, and no suspicious or unjustified values.
- **4:** Good explanation — mostly reasonable, with at most minor issues in some features.
- **3:** Moderate quality — some questionable or poorly aligned SHAP scores, but overall still partially plausible.
- **2:** Poor quality — several features have inappropriate or suspicious SHAP scores.
- **1:** Very low quality — the explanation is clearly flawed, with major issues in multiple key features.

Also, list **AT MOST three feature IDs** whose relevance scores are **unjustified or suspicious**, based on the feature's value and known importance.

Do not consider the model's final prediction. Focus only on whether the explanation is plausible and grounded.

### Output format:
<score><space><comma-separated list of incorrect feature IDs>

Examples:
- Excellent explanation: `5`
- Good explanation with minor issues: `4 5`
- Low quality with clear issues: `2 1,6`
- Very low quality with major issues: `1 2,4,7`

If there are no suspicious features, leave the second part empty (just the score). **DO NOT include any additional text or explanations in your response.**

"""

PROMPT_CREDITCARD = """
You are an expert in explainable AI and credit risk assessment. Your task is to evaluate the quality of a SHAP explanation that describes why a person may be predicted to **default on their credit card payment next month**.

Each explanation is a list of features in the following format:
<featureID> <feature_name> : <feature_value> = <feature relevance score>

Your goal is to determine how **reasonable and high-quality** the explanation is, based on the SHAP scores and your domain knowledge.

### Understanding SHAP scores:
- A **positive SHAP score (> 0)** means the feature increases the person’s risk of default, contributing to a higher predicted risk.
- A **negative SHAP score (< 0)** means the feature decreases the risk, contributing to a lower predicted risk.
- A **SHAP score of 0** means the feature has no impact on the prediction.
- The **magnitude** of the SHAP score reflects the **strength of the feature’s influence** on the model's decision — larger absolute values imply greater impact.

### Your task:
Assign a **quality score from 1 to 5**:
- **5:** Excellent explanation — all important features have appropriate SHAP scores, and no suspicious or unjustified values.
- **4:** Good explanation — mostly reasonable, with at most minor issues in some features.
- **3:** Moderate quality — some questionable or poorly aligned SHAP scores, but overall still partially plausible.
- **2:** Poor quality — several features have inappropriate or suspicious SHAP scores.
- **1:** Very low quality — the explanation is clearly flawed, with major issues in multiple key features.

Also, list **AT MOST three feature IDs** whose relevance scores are **unjustified or suspicious**, based on the feature's value and known importance.

Do not consider the model's prediction. Focus only on whether the explanation is plausible and grounded.

### Output format:
<score><space><comma-separated list of incorrect feature IDs>

Examples:
- Excellent explanation: `5`
- Good explanation with minor issues: `4 5`
- Low quality with clear issues: `2 1,6`
- Very low quality with major issues: `1 2,4,7`

If there are no suspicious features, leave the second part empty (just the score). DO NOT include any additional text or explanations in your response.

### Explanation to evaluate:
"""

PROMPT_CHURN = """
You are an expert in explainable AI and customer behavior analytics in the telecommunications sector. Your task is to evaluate the quality of a SHAP explanation that describes why a telecom customer was predicted to **churn**.

Each explanation is a list of features in the following format:
<featureID> <feature_name> : <feature_value> = <feature relevance score>

Your goal is to determine how **reasonable and high-quality** the explanation is, based on the SHAP scores and your domain knowledge.

### Understanding SHAP scores:
- A **positive SHAP score (> 0)** means the feature contributes to the prediction that the customer **will churn**.
- A **negative SHAP score (< 0)** means the feature contributes to the prediction that the customer **will not churn**.
- A **SHAP score of 0** means the feature has no influence on the decision.
- The magnitude reflects the strength of the feature’s contribution.

### Your task:
Assign a **quality score from 1 to 5**:
- **5:** Excellent explanation — all important features have appropriate SHAP scores, and no suspicious or unjustified values.
- **4:** Good explanation — mostly reasonable, with at most minor issues in some features.
- **3:** Moderate quality — some questionable or poorly aligned SHAP scores, but overall still partially plausible.
- **2:** Poor quality — several features have inappropriate or suspicious SHAP scores.
- **1:** Very low quality — the explanation is clearly flawed, with major issues in multiple key features.

Also, list **AT MOST three feature IDs** whose relevance scores are **unjustified or suspicious**, based on the feature's value and known importance.

Do not consider the model's prediction. Focus only on whether the explanation is plausible and grounded.

### Output format:
<score><space><comma-separated list of incorrect feature IDs>

Examples:
- Excellent explanation: `5`
- Good explanation with minor issues: `4 5`
- Low quality with clear issues: `2 1,6`
- Very low quality with major issues: `1 2,4,7`

If there are no suspicious features, leave the second part empty (just the score). DO NOT include any additional text or explanations in your response.

### Explanation to evaluate:
"""

PROMPT_WINE = """
You are an expert in explainable AI and wine quality prediction. Your task is to evaluate the quality of a SHAP explanation that describes why a particular wine is predicted to have a **certain quality score**.

Each explanation is a list of features in the following format:  
<featureID> <feature_name> : <feature_value> = <feature relevance score>

Your goal is to determine how **reasonable and high-quality** the explanation is, based on the SHAP scores and your domain knowledge of wine quality factors.

### Understanding SHAP scores:
- A **positive SHAP score (> 0)** means the feature increases the predicted wine quality.
- A **negative SHAP score (< 0)** means the feature decreases the predicted wine quality.
- A **SHAP score of 0** means the feature has no impact on the prediction.
- The **magnitude** of the SHAP score reflects the **strength of the feature’s influence** — larger absolute values imply greater impact.

### Your task:
Assign a **quality score from 1 to 5**:
- **5:** Excellent explanation — all important features have appropriate SHAP scores, and no suspicious or unjustified values.
- **4:** Good explanation — mostly reasonable, with at most minor issues in some features.
- **3:** Moderate quality — some questionable or poorly aligned SHAP scores, but overall still partially plausible.
- **2:** Poor quality — several features have inappropriate or suspicious SHAP scores.
- **1:** Very low quality — the explanation is clearly flawed, with major issues in multiple key features.

Also, list **AT MOST three feature IDs** whose relevance scores are **unjustified or suspicious**, based on the feature's value and known importance.

Do not consider the model's final prediction. Focus only on whether the explanation is plausible and grounded. 

### Output format:
<score><space><comma-separated list of incorrect feature IDs>

Examples:
- Excellent explanation: `5`
- Good explanation with minor issues: `4 5`
- Low quality with clear issues: `2 1,6`
- Very low quality with major issues: `1 2,4,7`

If there are no suspicious features, leave the second part empty (just the score). **DO NOT include any additional text or explanations in your response.**

### Explanation to evaluate:
"""

PROMPT_PARKINSON = """
You are an expert in explainable AI and biomedical informatics with a focus on neurodegenerative disorders. Your task is to evaluate the quality of a SHAP explanation that describes why a patient with Parkinson’s disease was predicted to have a particular level of motor impairment (as measured by the UPDRS score).

Each explanation is a list of features in the following format:
<featureID> <feature_name> : <feature_value> = <feature relevance score>

Your goal is to determine how **reasonable and high-quality** the explanation is, based on the SHAP scores and your domain knowledge.

### Understanding SHAP scores:
- A **positive SHAP score (> 0)** means the feature contributes to a higher predicted UPDRS score (worse condition).
- A **negative SHAP score (< 0)** means the feature contributes to a lower predicted UPDRS score (better condition).
- A **SHAP score of 0** means the feature has no impact.
- The magnitude reflects the strength of the feature’s influence.

### Your task:
Assign a **quality score from 1 to 5**:
- **5:** Excellent explanation — all major biomarkers have plausible SHAP contributions, no obvious mistakes.
- **4:** Good explanation — mostly aligned with medical evidence, at most minor inconsistencies.
- **3:** Moderate quality — mix of justified and questionable attributions.
- **2:** Poor quality — several implausible or suspicious effects.
- **1:** Very low quality — major inconsistencies with medical knowledge or voice feature interpretation.

Also, list **AT MOST three feature IDs** whose relevance scores are **unjustified or suspicious**, based on the feature's value and known importance.

Do not consider the model's prediction. Focus only on whether the explanation is plausible and grounded.

### Output format:
<score><space><comma-separated list of incorrect feature IDs>

Examples:
- Excellent explanation: `5`
- Good explanation with minor issues: `4 5`
- Low quality with clear issues: `2 1,6`
- Very low quality with major issues: `1 2,4,7`

If there are no suspicious features, leave the second part empty (just the score). DO NOT include any additional text or explanations in your response.

### Explanation to evaluate:
"""

PROMPT_BIKE = """
You are an expert in explainable machine learning and time-series forecasting, with a focus on urban mobility analytics. Your task is to evaluate the quality of a SHAP explanation that describes why the model predicted a particular number of daily bike rentals
Each explanation is a list of features in the following format:
<featureID> <feature_name> : <feature_value> = <feature relevance score>

Your goal is to determine how **reasonable and high-quality** the explanation is, based on the SHAP scores and your domain knowledge.

### Understanding SHAP scores:
- A **positive SHAP score (> 0)** means the feature contributes to a higher predicted bike rental count.
- A **negative SHAP score (< 0)** means the feature contributes to a lower predicted bike rental count.
- A **SHAP score of 0** means the feature has no impact.
- The magnitude reflects the strength of the feature’s influence.

### Your task:
Assign a **quality score from 1 to 5**:
- **5**: Excellent explanation — all major predictors (e.g., temperature, season, weather, workingday) have plausible SHAP contributions; no suspicious values.
- **4**: Good explanation — mostly reasonable, at most minor inconsistencies.
- **3**: Moderate quality — mix of justified and questionable attributions.
- **2**: Poor quality — several implausible or suspicious effects.
- **1**: Very low quality — major inconsistencies with domain knowledge (e.g., high temperature lowering demand, bad weather increasing it).

Also, list **AT MOST three feature IDs** whose relevance scores are **unjustified or suspicious**, based on the feature's value and known importance.

Do not consider the model's prediction. Focus only on whether the explanation is plausible and grounded.

### Output format:
<score><space><comma-separated list of incorrect feature IDs>

Examples:
- Excellent explanation: `5`
- Good explanation with minor issues: `4 5`
- Low quality with clear issues: `2 1,6`
- Very low quality with major issues: `1 2,4,7`

If there are no suspicious features, leave the second part empty (just the score). DO NOT include any additional text or explanations in your response.

### Explanation to evaluate:
"""

PROMPT_POWER = """
You are an expert in Explainable AI and energy forecasting, with a focus on urban electricity consumption. Your task is to evaluate the quality of a SHAP explanation that describes why the model predicted a particular power consumption level.

Each explanation is a list of features in the following format:
<featureID> <feature_name> : <feature_value> = <feature relevance score>

Your goal is to determine how **reasonable and high-quality** the explanation is, based on the SHAP scores and your domain knowledge.

### Understanding SHAP scores:
- A **positive SHAP score (> 0)** means the feature contributes to a higher predicted power consumption.
- A **negative SHAP score (< 0)** means the feature contributes to a lower predicted power consumption.
- A **SHAP score of 0** means the feature has no impact.
- The magnitude reflects the strength of the feature’s influence.

### Your task:
Assign a **quality score from 1 to 5**:
- **5**: Excellent explanation — all major predictors (e.g., temperature, humidity, wind speed, visibility, hour) have plausible SHAP contributions; no suspicious values.
- **4**: Good explanation — mostly reasonable, with at most minor inconsistencies.
- **3**: Moderate quality — mix of well-supported and questionable attributions.
- **2**: Poor quality — several implausible or suspicious effects.
- **1**: Very low quality — major inconsistencies with domain knowledge (e.g., high temperature lowering consumption in summer, wind speed sharply increasing demand).

Also, list **AT MOST three feature IDs** whose relevance scores are **unjustified or suspicious**, based on the feature's value and known importance.

Do not consider the model's prediction. Focus only on whether the explanation is plausible and grounded.

### Output format:
<score><space><comma-separated list of incorrect feature IDs>

Examples:
- Excellent explanation: `5`
- Good explanation with minor issues: `4 5`
- Low quality with clear issues: `2 1,6`
- Very low quality with major issues: `1 2,4,7`

If there are no suspicious features, leave the second part empty (just the score). DO NOT include any additional text or explanations in your response.

### Explanation to evaluate:
"""

