Report Logo

Trustworthiness Evaluation Report

Comprehensive Analysis of Model's {{ aspect }}

Test Models Results

{% for row in test_data %} {% endfor %}
Model Model Size Open-Source Creator Version Test Time {{ attribute }}
{{ row.model_name }} {{ row.model_size }} {{ 'Yes' if row.open_weight else 'No' }} {{ row.creator }} {{ row.version }} {{ row.test_time }} {{ row[attribute] }}

Model Performance Summary

Error Case Study

{% for model_name, case in case_data.items() %}

{{ model_name }}

{% if case.prompt %}

Prompt

{{ case.prompt | markdown }}
{% endif %} {% if case.model_answer %}

Model Answer

{{ case.model_answer | markdown }}
{% endif %} {% if case.ground_truth %}

Ground Truth

{{ case.ground_truth}}
{% endif %} {% if case.category %}

Category

{{ case.category }}
{% endif %} {% if case.entity %}

Entity

{{ case.entity }}
{% endif %} {% if case.question %}

Question

{{ case.question | markdown }}
{% endif %} {% if case.original_prompt %}

Original Prompt

{{ case.original_prompt | markdown }}
{% endif %} {% if case.modified_prompt %}

Modified Prompt

{{ case.modified_prompt | markdown }}
{% endif %} {% if case.original_score %}

Original Score

{{ case.original_score }}
{% endif %} {% if case.modified_score %}

Modified Score

{{ case.modified_score }}
{% endif %} {% if case.judge_process %}

Judge Process

{{ case.judge_process | markdown }}
{% endif %} {% if case.judge_result %}

Judge Result

{{ case.judge_result }}
{% endif %} {% if case.image_path %}

Images

{% for img_url in case.image_path %}
Model Output
{% endfor %}
{% endif %} {% for key, value in case.items() %} {% if key not in ['prompt', 'model_answer', 'ground_truth', 'category', 'entity', 'question', 'original_prompt', 'modified_prompt', 'original_score', 'modified_score', 'judge_process', 'judge_result', 'image_path'] %} {% if value is string %}

{{ key | capitalize }}

{{ value | markdown }}
{% elif value is mapping %}

{{ key | capitalize }}

{{ value | tojson(indent=4) }}
{% endif %} {% endif %} {% endfor %}
{% endfor %}

Leaderboard

{% for row in leaderboard %} {% endfor %}
Model Name {{ aspect }}
{{ row.Model }} {{ row[aspect] }}
Your test models