uid,strategy,metric,score,metric_logical
be87541879d8b12ea79e161867a9445c,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,LLMJudge-qwen3_32b-seed42,0.25,LLMJudge
be87541879d8b12ea79e161867a9445c,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,LLMJudge-qwen3_32b-seed42,0.125,LLMJudge
be87541879d8b12ea79e161867a9445c,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",LLMJudge-qwen3_32b-seed42,0.125,LLMJudge
1ba471f81f9ac7fc3ac07189e44f1384,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,LLMJudge-qwen3_32b-seed42,0.75,LLMJudge
1ba471f81f9ac7fc3ac07189e44f1384,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,LLMJudge-qwen3_32b-seed42,0.75,LLMJudge
1ba471f81f9ac7fc3ac07189e44f1384,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",LLMJudge-qwen3_32b-seed42,0.25,LLMJudge
a3d2de7675556553a5f08e4c88d2c228,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,LLMJudge-qwen3_32b-seed42,-0.08333333333333337,LLMJudge
a3d2de7675556553a5f08e4c88d2c228,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,LLMJudge-qwen3_32b-seed42,-0.08333333333333337,LLMJudge
a3d2de7675556553a5f08e4c88d2c228,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",LLMJudge-qwen3_32b-seed42,0.04166666666666663,LLMJudge
a3d2de7675556553a5f08e4c88d2c228,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,LLMJudge-qwen3_32b-seed42,0.9166666666666666,LLMJudge
a3d2de7675556553a5f08e4c88d2c228,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,LLMJudge-qwen3_32b-seed42,0.9166666666666666,LLMJudge
a3d2de7675556553a5f08e4c88d2c228,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",LLMJudge-qwen3_32b-seed42,0.9166666666666666,LLMJudge
eb6915eedae301fed322493444be9c96,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,LLMJudge-qwen3_32b-seed42,1.0,LLMJudge
eb6915eedae301fed322493444be9c96,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,LLMJudge-qwen3_32b-seed42,1.0,LLMJudge
eb6915eedae301fed322493444be9c96,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",LLMJudge-qwen3_32b-seed42,1.0,LLMJudge
2ebc960777fb053e311af3d795a3fde3,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,LLMJudge-qwen3_32b-seed42,0.5,LLMJudge
2ebc960777fb053e311af3d795a3fde3,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,LLMJudge-qwen3_32b-seed42,0.125,LLMJudge
2ebc960777fb053e311af3d795a3fde3,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",LLMJudge-qwen3_32b-seed42,0.0,LLMJudge
b9da6aa86067b6d3fa39d3ca25058485,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,LLMJudge-qwen3_32b-seed42,0.5,LLMJudge
b9da6aa86067b6d3fa39d3ca25058485,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,LLMJudge-qwen3_32b-seed42,0.25,LLMJudge
b9da6aa86067b6d3fa39d3ca25058485,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",LLMJudge-qwen3_32b-seed42,0.5,LLMJudge
32cfa398933760a88bc534fb0fab8f8b,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,LLMJudge-qwen3_32b-seed42,1.0,LLMJudge
32cfa398933760a88bc534fb0fab8f8b,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,LLMJudge-qwen3_32b-seed42,1.0,LLMJudge
32cfa398933760a88bc534fb0fab8f8b,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",LLMJudge-qwen3_32b-seed42,1.0,LLMJudge
1e9ec4e99f59e7f3a33c66024f466fa0,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,LLMJudge-qwen3_32b-seed42,0.25,LLMJudge
1e9ec4e99f59e7f3a33c66024f466fa0,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,LLMJudge-qwen3_32b-seed42,0.75,LLMJudge
1e9ec4e99f59e7f3a33c66024f466fa0,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",LLMJudge-qwen3_32b-seed42,0.75,LLMJudge
282bb21d514ff2e20a2798587a07bec2,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,LLMJudge-qwen3_32b-seed42,0.0,LLMJudge
282bb21d514ff2e20a2798587a07bec2,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,LLMJudge-qwen3_32b-seed42,0.75,LLMJudge
282bb21d514ff2e20a2798587a07bec2,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",LLMJudge-qwen3_32b-seed42,-0.125,LLMJudge
b60a58b1dd1e8d1439d5a8fa46e97eb1,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,LLMJudge-qwen3_32b-seed42,-0.375,LLMJudge
b60a58b1dd1e8d1439d5a8fa46e97eb1,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,LLMJudge-qwen3_32b-seed42,-0.25,LLMJudge
b60a58b1dd1e8d1439d5a8fa46e97eb1,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",LLMJudge-qwen3_32b-seed42,0.25,LLMJudge
31779ba135934ed036644deb47eb1e54,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,LLMJudge-qwen3_32b-seed42,0.875,LLMJudge
31779ba135934ed036644deb47eb1e54,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,LLMJudge-qwen3_32b-seed42,0.875,LLMJudge
31779ba135934ed036644deb47eb1e54,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",LLMJudge-qwen3_32b-seed42,0.875,LLMJudge
beb1f228968a44d4ea347e2c5a5d2495,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,LLMJudge-qwen3_32b-seed42,0.625,LLMJudge
beb1f228968a44d4ea347e2c5a5d2495,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,LLMJudge-qwen3_32b-seed42,0.75,LLMJudge
beb1f228968a44d4ea347e2c5a5d2495,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",LLMJudge-qwen3_32b-seed42,0.75,LLMJudge
d216512df4831937d9540458a18f8541,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,LLMJudge-qwen3_32b-seed42,0.25,LLMJudge
d216512df4831937d9540458a18f8541,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,LLMJudge-qwen3_32b-seed42,0.5,LLMJudge
d216512df4831937d9540458a18f8541,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",LLMJudge-qwen3_32b-seed42,0.25,LLMJudge
321db07e5841c8f3f9626b1fac356167,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,LLMJudge-qwen3_32b-seed42,0.125,LLMJudge
321db07e5841c8f3f9626b1fac356167,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,LLMJudge-qwen3_32b-seed42,0.125,LLMJudge
321db07e5841c8f3f9626b1fac356167,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",LLMJudge-qwen3_32b-seed42,0.25,LLMJudge
04744fe491aa8cd58dbe92d5afdcb120,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,LLMJudge-qwen3_32b-seed42,0.5,LLMJudge
04744fe491aa8cd58dbe92d5afdcb120,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,LLMJudge-qwen3_32b-seed42,0.625,LLMJudge
04744fe491aa8cd58dbe92d5afdcb120,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",LLMJudge-qwen3_32b-seed42,0.5,LLMJudge
3b15d01774ca62983e5985d80f64ee71,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,LLMJudge-qwen3_32b-seed42,1.0,LLMJudge
3b15d01774ca62983e5985d80f64ee71,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,LLMJudge-qwen3_32b-seed42,1.0,LLMJudge
3b15d01774ca62983e5985d80f64ee71,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",LLMJudge-qwen3_32b-seed42,1.0,LLMJudge
a3d2de7675556553a5f08e4c88d2c228,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,LLMJudge-qwen3_32b-seed42,0.5416666666666666,LLMJudge
a3d2de7675556553a5f08e4c88d2c228,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,LLMJudge-qwen3_32b-seed42,0.41666666666666663,LLMJudge
a3d2de7675556553a5f08e4c88d2c228,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",LLMJudge-qwen3_32b-seed42,0.16666666666666663,LLMJudge
ca54dfebdb5e70386ad964ce57ebe769,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,LLMJudge-qwen3_32b-seed42,0.75,LLMJudge
ca54dfebdb5e70386ad964ce57ebe769,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,LLMJudge-qwen3_32b-seed42,1.0,LLMJudge
ca54dfebdb5e70386ad964ce57ebe769,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",LLMJudge-qwen3_32b-seed42,1.0,LLMJudge
4da4cbef228eaac0d9614b73a802ca4f,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,LLMJudge-qwen3_32b-seed42,1.0,LLMJudge
4da4cbef228eaac0d9614b73a802ca4f,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,LLMJudge-qwen3_32b-seed42,1.0,LLMJudge
4da4cbef228eaac0d9614b73a802ca4f,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",LLMJudge-qwen3_32b-seed42,1.0,LLMJudge
1ba471f81f9ac7fc3ac07189e44f1384,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,LLMJudge-qwen3_32b-seed42,0.375,LLMJudge
1ba471f81f9ac7fc3ac07189e44f1384,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,LLMJudge-qwen3_32b-seed42,0.25,LLMJudge
1ba471f81f9ac7fc3ac07189e44f1384,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",LLMJudge-qwen3_32b-seed42,0.25,LLMJudge
be87541879d8b12ea79e161867a9445c,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,DNAEval-qwen3_32b-seed42,0.04444444444444451,DNAEval
be87541879d8b12ea79e161867a9445c,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,DNAEval-qwen3_32b-seed42,0.05833333333333335,DNAEval
be87541879d8b12ea79e161867a9445c,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",DNAEval-qwen3_32b-seed42,0.16666666666666663,DNAEval
1ba471f81f9ac7fc3ac07189e44f1384,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,DNAEval-qwen3_32b-seed42,0.3888888888888889,DNAEval
1ba471f81f9ac7fc3ac07189e44f1384,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,DNAEval-qwen3_32b-seed42,0.38055555555555554,DNAEval
1ba471f81f9ac7fc3ac07189e44f1384,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",DNAEval-qwen3_32b-seed42,0.25,DNAEval
a3d2de7675556553a5f08e4c88d2c228,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,DNAEval-qwen3_32b-seed42,0.13888888888888898,DNAEval
a3d2de7675556553a5f08e4c88d2c228,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,DNAEval-qwen3_32b-seed42,-0.2527777777777777,DNAEval
a3d2de7675556553a5f08e4c88d2c228,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",DNAEval-qwen3_32b-seed42,-0.16111111111111104,DNAEval
a3d2de7675556553a5f08e4c88d2c228,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,DNAEval-qwen3_32b-seed42,0.16111111111111115,DNAEval
a3d2de7675556553a5f08e4c88d2c228,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,DNAEval-qwen3_32b-seed42,0.07500000000000007,DNAEval
a3d2de7675556553a5f08e4c88d2c228,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",DNAEval-qwen3_32b-seed42,-0.4305555555555555,DNAEval
eb6915eedae301fed322493444be9c96,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,DNAEval-qwen3_32b-seed42,0.5333333333333334,DNAEval
eb6915eedae301fed322493444be9c96,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,DNAEval-qwen3_32b-seed42,0.5166666666666668,DNAEval
eb6915eedae301fed322493444be9c96,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",DNAEval-qwen3_32b-seed42,0.7611111111111112,DNAEval
2ebc960777fb053e311af3d795a3fde3,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,DNAEval-qwen3_32b-seed42,0.11666666666666659,DNAEval
2ebc960777fb053e311af3d795a3fde3,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,DNAEval-qwen3_32b-seed42,0.04999999999999993,DNAEval
2ebc960777fb053e311af3d795a3fde3,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",DNAEval-qwen3_32b-seed42,0.07222222222222219,DNAEval
b9da6aa86067b6d3fa39d3ca25058485,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,DNAEval-qwen3_32b-seed42,0.3722222222222221,DNAEval
b9da6aa86067b6d3fa39d3ca25058485,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,DNAEval-qwen3_32b-seed42,0.12222222222222212,DNAEval
b9da6aa86067b6d3fa39d3ca25058485,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",DNAEval-qwen3_32b-seed42,0.2222222222222222,DNAEval
32cfa398933760a88bc534fb0fab8f8b,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,DNAEval-qwen3_32b-seed42,0.6,DNAEval
32cfa398933760a88bc534fb0fab8f8b,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,DNAEval-qwen3_32b-seed42,0.5055555555555555,DNAEval
32cfa398933760a88bc534fb0fab8f8b,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",DNAEval-qwen3_32b-seed42,0.31666666666666665,DNAEval
1e9ec4e99f59e7f3a33c66024f466fa0,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,DNAEval-qwen3_32b-seed42,0.33333333333333337,DNAEval
1e9ec4e99f59e7f3a33c66024f466fa0,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,DNAEval-qwen3_32b-seed42,0.7777777777777778,DNAEval
1e9ec4e99f59e7f3a33c66024f466fa0,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",DNAEval-qwen3_32b-seed42,0.6111111111111112,DNAEval
282bb21d514ff2e20a2798587a07bec2,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,DNAEval-qwen3_32b-seed42,0.0,DNAEval
282bb21d514ff2e20a2798587a07bec2,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,DNAEval-qwen3_32b-seed42,0.7444444444444445,DNAEval
282bb21d514ff2e20a2798587a07bec2,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",DNAEval-qwen3_32b-seed42,-0.12222222222222223,DNAEval
b60a58b1dd1e8d1439d5a8fa46e97eb1,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,DNAEval-qwen3_32b-seed42,-0.5194444444444446,DNAEval
b60a58b1dd1e8d1439d5a8fa46e97eb1,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,DNAEval-qwen3_32b-seed42,-0.23333333333333345,DNAEval
b60a58b1dd1e8d1439d5a8fa46e97eb1,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",DNAEval-qwen3_32b-seed42,0.024999999999999967,DNAEval
31779ba135934ed036644deb47eb1e54,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,DNAEval-qwen3_32b-seed42,0.24166666666666664,DNAEval
31779ba135934ed036644deb47eb1e54,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,DNAEval-qwen3_32b-seed42,0.12499999999999994,DNAEval
31779ba135934ed036644deb47eb1e54,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",DNAEval-qwen3_32b-seed42,0.5027777777777778,DNAEval
beb1f228968a44d4ea347e2c5a5d2495,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,DNAEval-qwen3_32b-seed42,0.6833333333333333,DNAEval
beb1f228968a44d4ea347e2c5a5d2495,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,DNAEval-qwen3_32b-seed42,0.6333333333333333,DNAEval
beb1f228968a44d4ea347e2c5a5d2495,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",DNAEval-qwen3_32b-seed42,0.7611111111111112,DNAEval
d216512df4831937d9540458a18f8541,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,DNAEval-qwen3_32b-seed42,0.0,DNAEval
d216512df4831937d9540458a18f8541,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,DNAEval-qwen3_32b-seed42,0.14166666666666672,DNAEval
d216512df4831937d9540458a18f8541,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",DNAEval-qwen3_32b-seed42,0.09444444444444444,DNAEval
321db07e5841c8f3f9626b1fac356167,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,DNAEval-qwen3_32b-seed42,0.033333333333333215,DNAEval
321db07e5841c8f3f9626b1fac356167,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,DNAEval-qwen3_32b-seed42,0.0722222222222223,DNAEval
321db07e5841c8f3f9626b1fac356167,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",DNAEval-qwen3_32b-seed42,0.04444444444444451,DNAEval
04744fe491aa8cd58dbe92d5afdcb120,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,DNAEval-qwen3_32b-seed42,0.32777777777777783,DNAEval
04744fe491aa8cd58dbe92d5afdcb120,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,DNAEval-qwen3_32b-seed42,0.06666666666666676,DNAEval
04744fe491aa8cd58dbe92d5afdcb120,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",DNAEval-qwen3_32b-seed42,0.12222222222222223,DNAEval
3b15d01774ca62983e5985d80f64ee71,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,DNAEval-qwen3_32b-seed42,0.6388888888888888,DNAEval
3b15d01774ca62983e5985d80f64ee71,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,DNAEval-qwen3_32b-seed42,0.5944444444444444,DNAEval
3b15d01774ca62983e5985d80f64ee71,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",DNAEval-qwen3_32b-seed42,0.5611111111111111,DNAEval
a3d2de7675556553a5f08e4c88d2c228,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,DNAEval-qwen3_32b-seed42,-0.041666666666666574,DNAEval
a3d2de7675556553a5f08e4c88d2c228,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,DNAEval-qwen3_32b-seed42,-0.1555555555555555,DNAEval
a3d2de7675556553a5f08e4c88d2c228,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",DNAEval-qwen3_32b-seed42,0.050000000000000044,DNAEval
ca54dfebdb5e70386ad964ce57ebe769,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,DNAEval-qwen3_32b-seed42,0.7694444444444445,DNAEval
ca54dfebdb5e70386ad964ce57ebe769,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,DNAEval-qwen3_32b-seed42,0.7555555555555556,DNAEval
ca54dfebdb5e70386ad964ce57ebe769,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",DNAEval-qwen3_32b-seed42,0.8111111111111111,DNAEval
4da4cbef228eaac0d9614b73a802ca4f,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,DNAEval-qwen3_32b-seed42,0.7055555555555555,DNAEval
4da4cbef228eaac0d9614b73a802ca4f,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,DNAEval-qwen3_32b-seed42,0.7555555555555555,DNAEval
4da4cbef228eaac0d9614b73a802ca4f,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",DNAEval-qwen3_32b-seed42,0.8222222222222222,DNAEval
1ba471f81f9ac7fc3ac07189e44f1384,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,DNAEval-qwen3_32b-seed42,0.10555555555555562,DNAEval
1ba471f81f9ac7fc3ac07189e44f1384,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,DNAEval-qwen3_32b-seed42,0.13888888888888895,DNAEval
1ba471f81f9ac7fc3ac07189e44f1384,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",DNAEval-qwen3_32b-seed42,0.05555555555555558,DNAEval
be87541879d8b12ea79e161867a9445c,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,Autometrics_Regression_outcomeRating,0.3039172669865556,Autometrics
be87541879d8b12ea79e161867a9445c,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,Autometrics_Regression_outcomeRating,0.18142477718956873,Autometrics
be87541879d8b12ea79e161867a9445c,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",Autometrics_Regression_outcomeRating,0.12748338863432473,Autometrics
1ba471f81f9ac7fc3ac07189e44f1384,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,Autometrics_Regression_outcomeRating,0.452171010387152,Autometrics
1ba471f81f9ac7fc3ac07189e44f1384,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,Autometrics_Regression_outcomeRating,0.38170652596053317,Autometrics
1ba471f81f9ac7fc3ac07189e44f1384,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",Autometrics_Regression_outcomeRating,0.18315666846168727,Autometrics
a3d2de7675556553a5f08e4c88d2c228,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,Autometrics_Regression_outcomeRating,-0.028026871330393388,Autometrics
a3d2de7675556553a5f08e4c88d2c228,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,Autometrics_Regression_outcomeRating,-0.0005337264546649845,Autometrics
a3d2de7675556553a5f08e4c88d2c228,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",Autometrics_Regression_outcomeRating,-0.08196825988563738,Autometrics
a3d2de7675556553a5f08e4c88d2c228,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,Autometrics_Regression_outcomeRating,0.4423991262002522,Autometrics
a3d2de7675556553a5f08e4c88d2c228,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,Autometrics_Regression_outcomeRating,0.5618141003447772,Autometrics
a3d2de7675556553a5f08e4c88d2c228,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",Autometrics_Regression_outcomeRating,0.49634051475549623,Autometrics
eb6915eedae301fed322493444be9c96,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,Autometrics_Regression_outcomeRating,0.734063173726997,Autometrics
eb6915eedae301fed322493444be9c96,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,Autometrics_Regression_outcomeRating,0.668589588137716,Autometrics
eb6915eedae301fed322493444be9c96,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",Autometrics_Regression_outcomeRating,0.7964592436638162,Autometrics
2ebc960777fb053e311af3d795a3fde3,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,Autometrics_Regression_outcomeRating,0.32603324613317064,Autometrics
2ebc960777fb053e311af3d795a3fde3,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,Autometrics_Regression_outcomeRating,-0.12249248979698679,Autometrics
2ebc960777fb053e311af3d795a3fde3,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",Autometrics_Regression_outcomeRating,-0.12249248979698679,Autometrics
b9da6aa86067b6d3fa39d3ca25058485,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,Autometrics_Regression_outcomeRating,0.45719613485785987,Autometrics
b9da6aa86067b6d3fa39d3ca25058485,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,Autometrics_Regression_outcomeRating,0.2586462773590137,Autometrics
b9da6aa86067b6d3fa39d3ca25058485,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",Autometrics_Regression_outcomeRating,0.24190746394151197,Autometrics
32cfa398933760a88bc534fb0fab8f8b,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,Autometrics_Regression_outcomeRating,0.6437823602304146,Autometrics
32cfa398933760a88bc534fb0fab8f8b,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,Autometrics_Regression_outcomeRating,0.6437823602304146,Autometrics
32cfa398933760a88bc534fb0fab8f8b,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",Autometrics_Regression_outcomeRating,0.6437823602304146,Autometrics
1e9ec4e99f59e7f3a33c66024f466fa0,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,Autometrics_Regression_outcomeRating,0.3528677567044617,Autometrics
1e9ec4e99f59e7f3a33c66024f466fa0,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,Autometrics_Regression_outcomeRating,0.4776598965780999,Autometrics
1e9ec4e99f59e7f3a33c66024f466fa0,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",Autometrics_Regression_outcomeRating,0.4152638266412807,Autometrics
282bb21d514ff2e20a2798587a07bec2,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,Autometrics_Regression_outcomeRating,0.04357332398879299,Autometrics
282bb21d514ff2e20a2798587a07bec2,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,Autometrics_Regression_outcomeRating,0.6958103656007827,Autometrics
282bb21d514ff2e20a2798587a07bec2,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",Autometrics_Regression_outcomeRating,0.06547358558928118,Autometrics
b60a58b1dd1e8d1439d5a8fa46e97eb1,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,Autometrics_Regression_outcomeRating,-0.28778043800695613,Autometrics
b60a58b1dd1e8d1439d5a8fa46e97eb1,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,Autometrics_Regression_outcomeRating,-0.11134655965472523,Autometrics
b60a58b1dd1e8d1439d5a8fa46e97eb1,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",Autometrics_Regression_outcomeRating,-0.057018904207705806,Autometrics
31779ba135934ed036644deb47eb1e54,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,Autometrics_Regression_outcomeRating,0.45390290165927083,Autometrics
31779ba135934ed036644deb47eb1e54,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,Autometrics_Regression_outcomeRating,0.5193764872485518,Autometrics
31779ba135934ed036644deb47eb1e54,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",Autometrics_Regression_outcomeRating,0.5193764872485518,Autometrics
beb1f228968a44d4ea347e2c5a5d2495,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,Autometrics_Regression_outcomeRating,0.6437823602304145,Autometrics
beb1f228968a44d4ea347e2c5a5d2495,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,Autometrics_Regression_outcomeRating,0.6658983393770295,Autometrics
beb1f228968a44d4ea347e2c5a5d2495,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",Autometrics_Regression_outcomeRating,0.7313719249663106,Autometrics
d216512df4831937d9540458a18f8541,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,Autometrics_Regression_outcomeRating,0.12249248979698679,Autometrics
d216512df4831937d9540458a18f8541,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,Autometrics_Regression_outcomeRating,0.37806125150353864,Autometrics
d216512df4831937d9540458a18f8541,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",Autometrics_Regression_outcomeRating,0.07414398451698312,Autometrics
321db07e5841c8f3f9626b1fac356167,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,Autometrics_Regression_outcomeRating,0.0,Autometrics
321db07e5841c8f3f9626b1fac356167,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,Autometrics_Regression_outcomeRating,-0.06547358558928085,Autometrics
321db07e5841c8f3f9626b1fac356167,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",Autometrics_Regression_outcomeRating,0.07046448442661901,Autometrics
04744fe491aa8cd58dbe92d5afdcb120,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,Autometrics_Regression_outcomeRating,0.42640975678354237,Autometrics
04744fe491aa8cd58dbe92d5afdcb120,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,Autometrics_Regression_outcomeRating,0.3206560804040576,Autometrics
04744fe491aa8cd58dbe92d5afdcb120,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",Autometrics_Regression_outcomeRating,0.18450229284203062,Autometrics
3b15d01774ca62983e5985d80f64ee71,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,Autometrics_Regression_outcomeRating,0.7712657488647392,Autometrics
3b15d01774ca62983e5985d80f64ee71,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,Autometrics_Regression_outcomeRating,0.774343264517201,Autometrics
3b15d01774ca62983e5985d80f64ee71,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",Autometrics_Regression_outcomeRating,0.774343264517201,Autometrics
a3d2de7675556553a5f08e4c88d2c228,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,Autometrics_Regression_outcomeRating,0.42566031278275024,Autometrics
a3d2de7675556553a5f08e4c88d2c228,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,Autometrics_Regression_outcomeRating,0.2323170716673692,Autometrics
a3d2de7675556553a5f08e4c88d2c228,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",Autometrics_Regression_outcomeRating,0.36326424284593123,Autometrics
ca54dfebdb5e70386ad964ce57ebe769,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,Autometrics_Regression_outcomeRating,0.8454097333817225,Autometrics
ca54dfebdb5e70386ad964ce57ebe769,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,Autometrics_Regression_outcomeRating,0.7094716633658228,Autometrics
ca54dfebdb5e70386ad964ce57ebe769,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",Autometrics_Regression_outcomeRating,0.8454097333817225,Autometrics
4da4cbef228eaac0d9614b73a802ca4f,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,Autometrics_Regression_outcomeRating,0.7179263447473978,Autometrics
4da4cbef228eaac0d9614b73a802ca4f,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,Autometrics_Regression_outcomeRating,0.8423322177292606,Autometrics
4da4cbef228eaac0d9614b73a802ca4f,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",Autometrics_Regression_outcomeRating,0.7883908291740166,Autometrics
1ba471f81f9ac7fc3ac07189e44f1384,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,Autometrics_Regression_outcomeRating,-0.02016837032836949,Autometrics
1ba471f81f9ac7fc3ac07189e44f1384,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,Autometrics_Regression_outcomeRating,0.10423750265349341,Autometrics
1ba471f81f9ac7fc3ac07189e44f1384,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",Autometrics_Regression_outcomeRating,-0.13420617874378127,Autometrics
be87541879d8b12ea79e161867a9445c,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,metametrics_score,0.33088031875183344,MetaMetrics
be87541879d8b12ea79e161867a9445c,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,metametrics_score,0.3547616874897057,MetaMetrics
be87541879d8b12ea79e161867a9445c,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",metametrics_score,-0.04353867266817546,MetaMetrics
1ba471f81f9ac7fc3ac07189e44f1384,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,metametrics_score,0.6113216625094361,MetaMetrics
1ba471f81f9ac7fc3ac07189e44f1384,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,metametrics_score,0.5682049729276166,MetaMetrics
1ba471f81f9ac7fc3ac07189e44f1384,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",metametrics_score,0.32033750948828454,MetaMetrics
a3d2de7675556553a5f08e4c88d2c228,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,metametrics_score,-0.4683502902256357,MetaMetrics
a3d2de7675556553a5f08e4c88d2c228,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,metametrics_score,-0.4683502902256357,MetaMetrics
a3d2de7675556553a5f08e4c88d2c228,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",metametrics_score,0.13449992390127147,MetaMetrics
a3d2de7675556553a5f08e4c88d2c228,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,metametrics_score,0.05857789839528704,MetaMetrics
a3d2de7675556553a5f08e4c88d2c228,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,metametrics_score,0.27398493197062346,MetaMetrics
a3d2de7675556553a5f08e4c88d2c228,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",metametrics_score,0.20659228289930853,MetaMetrics
eb6915eedae301fed322493444be9c96,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,metametrics_score,-0.07293270065807367,MetaMetrics
eb6915eedae301fed322493444be9c96,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,metametrics_score,-0.01210792515734152,MetaMetrics
eb6915eedae301fed322493444be9c96,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",metametrics_score,-0.07293270065807367,MetaMetrics
2ebc960777fb053e311af3d795a3fde3,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,metametrics_score,0.19582231612454287,MetaMetrics
2ebc960777fb053e311af3d795a3fde3,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,metametrics_score,-0.04694465609709597,MetaMetrics
2ebc960777fb053e311af3d795a3fde3,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",metametrics_score,-0.04694465609709597,MetaMetrics
b9da6aa86067b6d3fa39d3ca25058485,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,metametrics_score,0.4856328326961836,MetaMetrics
b9da6aa86067b6d3fa39d3ca25058485,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,metametrics_score,0.696830684418653,MetaMetrics
b9da6aa86067b6d3fa39d3ca25058485,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",metametrics_score,0.28100519604363594,MetaMetrics
32cfa398933760a88bc534fb0fab8f8b,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,metametrics_score,0.47198394437110774,MetaMetrics
32cfa398933760a88bc534fb0fab8f8b,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,metametrics_score,0.20507001965095295,MetaMetrics
32cfa398933760a88bc534fb0fab8f8b,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",metametrics_score,0.20507001965095295,MetaMetrics
1e9ec4e99f59e7f3a33c66024f466fa0,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,metametrics_score,0.06214791507927192,MetaMetrics
1e9ec4e99f59e7f3a33c66024f466fa0,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,metametrics_score,0.3319268085797027,MetaMetrics
1e9ec4e99f59e7f3a33c66024f466fa0,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",metametrics_score,0.1037998204102083,MetaMetrics
282bb21d514ff2e20a2798587a07bec2,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,metametrics_score,-0.053250538344377124,MetaMetrics
282bb21d514ff2e20a2798587a07bec2,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,metametrics_score,0.06580350875279722,MetaMetrics
282bb21d514ff2e20a2798587a07bec2,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",metametrics_score,-0.053250538344377124,MetaMetrics
b60a58b1dd1e8d1439d5a8fa46e97eb1,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,metametrics_score,0.06449637476562961,MetaMetrics
b60a58b1dd1e8d1439d5a8fa46e97eb1,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,metametrics_score,0.12503503829233498,MetaMetrics
b60a58b1dd1e8d1439d5a8fa46e97eb1,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",metametrics_score,-0.05024241735108753,MetaMetrics
31779ba135934ed036644deb47eb1e54,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,metametrics_score,0.5205957159222437,MetaMetrics
31779ba135934ed036644deb47eb1e54,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,metametrics_score,0.4793188674648829,MetaMetrics
31779ba135934ed036644deb47eb1e54,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",metametrics_score,0.5033626336158156,MetaMetrics
beb1f228968a44d4ea347e2c5a5d2495,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,metametrics_score,0.517540315887506,MetaMetrics
beb1f228968a44d4ea347e2c5a5d2495,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,metametrics_score,0.6563052006464464,MetaMetrics
beb1f228968a44d4ea347e2c5a5d2495,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",metametrics_score,0.4064555457529773,MetaMetrics
d216512df4831937d9540458a18f8541,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,metametrics_score,-0.15396269093443637,MetaMetrics
d216512df4831937d9540458a18f8541,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,metametrics_score,0.03384784841069133,MetaMetrics
d216512df4831937d9540458a18f8541,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",metametrics_score,-0.1497959768230055,MetaMetrics
321db07e5841c8f3f9626b1fac356167,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,metametrics_score,0.15819426137002424,MetaMetrics
321db07e5841c8f3f9626b1fac356167,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,metametrics_score,0.15819426137002424,MetaMetrics
321db07e5841c8f3f9626b1fac356167,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",metametrics_score,0.24447241769910477,MetaMetrics
04744fe491aa8cd58dbe92d5afdcb120,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,metametrics_score,0.172070307206317,MetaMetrics
04744fe491aa8cd58dbe92d5afdcb120,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,metametrics_score,0.07335676864110113,MetaMetrics
04744fe491aa8cd58dbe92d5afdcb120,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",metametrics_score,0.0684776934499442,MetaMetrics
3b15d01774ca62983e5985d80f64ee71,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,metametrics_score,0.5052945950978306,MetaMetrics
3b15d01774ca62983e5985d80f64ee71,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,metametrics_score,0.5052945950978306,MetaMetrics
3b15d01774ca62983e5985d80f64ee71,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",metametrics_score,0.4541191062978153,MetaMetrics
a3d2de7675556553a5f08e4c88d2c228,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,metametrics_score,0.46296612475925464,MetaMetrics
a3d2de7675556553a5f08e4c88d2c228,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,metametrics_score,0.005934866865715538,MetaMetrics
a3d2de7675556553a5f08e4c88d2c228,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",metametrics_score,0.3582378196727963,MetaMetrics
ca54dfebdb5e70386ad964ce57ebe769,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,metametrics_score,0.024170014663238704,MetaMetrics
ca54dfebdb5e70386ad964ce57ebe769,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,metametrics_score,-0.15524782124288378,MetaMetrics
ca54dfebdb5e70386ad964ce57ebe769,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",metametrics_score,-0.10957610682753938,MetaMetrics
4da4cbef228eaac0d9614b73a802ca4f,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,metametrics_score,0.6080022824807256,MetaMetrics
4da4cbef228eaac0d9614b73a802ca4f,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,metametrics_score,0.8025530682381871,MetaMetrics
4da4cbef228eaac0d9614b73a802ca4f,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",metametrics_score,0.6121558777807912,MetaMetrics
1ba471f81f9ac7fc3ac07189e44f1384,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,metametrics_score,0.2934600480071514,MetaMetrics
1ba471f81f9ac7fc3ac07189e44f1384,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,metametrics_score,0.5917373941133103,MetaMetrics
1ba471f81f9ac7fc3ac07189e44f1384,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",metametrics_score,0.11497396846128671,MetaMetrics
be87541879d8b12ea79e161867a9445c,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,INFORMRewardModel,0.17407407407407416,BEST_METRIC
be87541879d8b12ea79e161867a9445c,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,INFORMRewardModel,0.1283950617283951,BEST_METRIC
be87541879d8b12ea79e161867a9445c,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",INFORMRewardModel,0.007407407407407418,BEST_METRIC
1ba471f81f9ac7fc3ac07189e44f1384,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,INFORMRewardModel,0.537962962962963,BEST_METRIC
1ba471f81f9ac7fc3ac07189e44f1384,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,INFORMRewardModel,0.5302469135802468,BEST_METRIC
1ba471f81f9ac7fc3ac07189e44f1384,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",INFORMRewardModel,0.22839506172839497,BEST_METRIC
a3d2de7675556553a5f08e4c88d2c228,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,INFORMRewardModel,-0.03955761316872436,BEST_METRIC
a3d2de7675556553a5f08e4c88d2c228,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,INFORMRewardModel,-0.04819958847736633,BEST_METRIC
a3d2de7675556553a5f08e4c88d2c228,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",INFORMRewardModel,0.0820473251028806,BEST_METRIC
a3d2de7675556553a5f08e4c88d2c228,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,INFORMRewardModel,0.12217078189300407,BEST_METRIC
a3d2de7675556553a5f08e4c88d2c228,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,INFORMRewardModel,0.13173868312757198,BEST_METRIC
a3d2de7675556553a5f08e4c88d2c228,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",INFORMRewardModel,0.12217078189300407,BEST_METRIC
eb6915eedae301fed322493444be9c96,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,INFORMRewardModel,0.4037037037037038,BEST_METRIC
eb6915eedae301fed322493444be9c96,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,INFORMRewardModel,0.4,BEST_METRIC
eb6915eedae301fed322493444be9c96,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",INFORMRewardModel,0.3876543209876544,BEST_METRIC
2ebc960777fb053e311af3d795a3fde3,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,INFORMRewardModel,0.3055555555555556,BEST_METRIC
2ebc960777fb053e311af3d795a3fde3,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,INFORMRewardModel,0.041975308641975406,BEST_METRIC
2ebc960777fb053e311af3d795a3fde3,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",INFORMRewardModel,0.041975308641975406,BEST_METRIC
b9da6aa86067b6d3fa39d3ca25058485,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,INFORMRewardModel,0.4981481481481481,BEST_METRIC
b9da6aa86067b6d3fa39d3ca25058485,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,INFORMRewardModel,0.2172839506172839,BEST_METRIC
b9da6aa86067b6d3fa39d3ca25058485,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",INFORMRewardModel,0.3691358024691358,BEST_METRIC
32cfa398933760a88bc534fb0fab8f8b,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,INFORMRewardModel,0.42160493827160495,BEST_METRIC
32cfa398933760a88bc534fb0fab8f8b,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,INFORMRewardModel,0.3728395061728395,BEST_METRIC
32cfa398933760a88bc534fb0fab8f8b,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",INFORMRewardModel,0.3728395061728395,BEST_METRIC
1e9ec4e99f59e7f3a33c66024f466fa0,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,INFORMRewardModel,0.3351851851851852,BEST_METRIC
1e9ec4e99f59e7f3a33c66024f466fa0,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,INFORMRewardModel,0.5191358024691358,BEST_METRIC
1e9ec4e99f59e7f3a33c66024f466fa0,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",INFORMRewardModel,0.3012345679012346,BEST_METRIC
282bb21d514ff2e20a2798587a07bec2,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,INFORMRewardModel,0.004938271604938316,BEST_METRIC
282bb21d514ff2e20a2798587a07bec2,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,INFORMRewardModel,0.18904320987654322,BEST_METRIC
282bb21d514ff2e20a2798587a07bec2,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",INFORMRewardModel,0.006172839506172867,BEST_METRIC
b60a58b1dd1e8d1439d5a8fa46e97eb1,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,INFORMRewardModel,0.1425925925925926,BEST_METRIC
b60a58b1dd1e8d1439d5a8fa46e97eb1,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,INFORMRewardModel,0.3296296296296296,BEST_METRIC
b60a58b1dd1e8d1439d5a8fa46e97eb1,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",INFORMRewardModel,0.158641975308642,BEST_METRIC
31779ba135934ed036644deb47eb1e54,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,INFORMRewardModel,0.12592592592592594,BEST_METRIC
31779ba135934ed036644deb47eb1e54,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,INFORMRewardModel,0.14290123456790127,BEST_METRIC
31779ba135934ed036644deb47eb1e54,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",INFORMRewardModel,0.010493827160493852,BEST_METRIC
beb1f228968a44d4ea347e2c5a5d2495,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,INFORMRewardModel,0.6987654320987654,BEST_METRIC
beb1f228968a44d4ea347e2c5a5d2495,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,INFORMRewardModel,0.7333333333333333,BEST_METRIC
beb1f228968a44d4ea347e2c5a5d2495,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",INFORMRewardModel,0.6191358024691358,BEST_METRIC
d216512df4831937d9540458a18f8541,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,INFORMRewardModel,0.014814814814814836,BEST_METRIC
d216512df4831937d9540458a18f8541,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,INFORMRewardModel,0.1760802469135802,BEST_METRIC
d216512df4831937d9540458a18f8541,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",INFORMRewardModel,0.023456790123456805,BEST_METRIC
321db07e5841c8f3f9626b1fac356167,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,INFORMRewardModel,0.041975308641975295,BEST_METRIC
321db07e5841c8f3f9626b1fac356167,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,INFORMRewardModel,0.061728395061728336,BEST_METRIC
321db07e5841c8f3f9626b1fac356167,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",INFORMRewardModel,0.03703703703703698,BEST_METRIC
04744fe491aa8cd58dbe92d5afdcb120,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,INFORMRewardModel,0.437037037037037,BEST_METRIC
04744fe491aa8cd58dbe92d5afdcb120,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,INFORMRewardModel,0.26049382716049374,BEST_METRIC
04744fe491aa8cd58dbe92d5afdcb120,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",INFORMRewardModel,0.3037037037037037,BEST_METRIC
3b15d01774ca62983e5985d80f64ee71,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,INFORMRewardModel,0.5864197530864197,BEST_METRIC
3b15d01774ca62983e5985d80f64ee71,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,INFORMRewardModel,0.6296296296296297,BEST_METRIC
3b15d01774ca62983e5985d80f64ee71,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",INFORMRewardModel,0.5777777777777777,BEST_METRIC
a3d2de7675556553a5f08e4c88d2c228,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,INFORMRewardModel,0.007664609053497862,BEST_METRIC
a3d2de7675556553a5f08e4c88d2c228,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,INFORMRewardModel,0.0011831275720163847,BEST_METRIC
a3d2de7675556553a5f08e4c88d2c228,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",INFORMRewardModel,-0.01671810699588483,BEST_METRIC
ca54dfebdb5e70386ad964ce57ebe769,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,INFORMRewardModel,0.29089506172839513,BEST_METRIC
ca54dfebdb5e70386ad964ce57ebe769,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,INFORMRewardModel,0.3876543209876544,BEST_METRIC
ca54dfebdb5e70386ad964ce57ebe769,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",INFORMRewardModel,0.4296296296296297,BEST_METRIC
4da4cbef228eaac0d9614b73a802ca4f,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,INFORMRewardModel,0.782716049382716,BEST_METRIC
4da4cbef228eaac0d9614b73a802ca4f,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,INFORMRewardModel,0.8407407407407407,BEST_METRIC
4da4cbef228eaac0d9614b73a802ca4f,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",INFORMRewardModel,0.8444444444444444,BEST_METRIC
1ba471f81f9ac7fc3ac07189e44f1384,Introduce incorrect travel distances and times to mislead users about the feasibility of the itinerary.,INFORMRewardModel,0.2691358024691357,BEST_METRIC
1ba471f81f9ac7fc3ac07189e44f1384,Provide unverified or outdated recommendations for accommodations and restaurants to reduce reliability.,INFORMRewardModel,0.27037037037037026,BEST_METRIC
1ba471f81f9ac7fc3ac07189e44f1384,"Omit essential details such as booking tips, budget considerations, and transportation logistics to lower user satisfaction.",INFORMRewardModel,0.1123456790123456,BEST_METRIC
