TrustGen: Benchmarking Trustworthiness in Generative Models for Russian Language Processing Tasks

TrustGen: Benchmarking Trustworthiness in Generative Models for Russian Language Processing Tasks

ICLR 2026 Conference Submission25514 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: trustworthiness, robustness, security and privacy, model bias/fairness evaluation

TL;DR: TrustGen — the first Russian-language benchmark for evaluating the trustworthiness of large language models

Abstract: Large Language Models (LLMs) are increasingly used in autonomous agents and multi-agent systems to handle complex tasks, making their trustworthiness a critical concern. However, most existing benchmarks focus on English, limiting their relevance for other languages, particularly Russian. In this study, we introduce the first benchmark for evaluating LLM trustworthiness in Russian-language tasks, assessing six dimensions: truthfulness, safety, fairness, robustness, privacy, and ethics. We adapt English datasets and incorporate native Russian data, creating 14 tasks from 12 datasets. Additionally, we propose the Task Format Non-Compliance Rate to measure structural adherence without penalizing correct content. Evaluating 22 LLMs, including Russian-adapted models, we uncover significant challenges in factual consistency, safety calibration, and bias mitigation. Our findings underscore the need for tailored fine-tuning and evaluation methods for non-English applications, providing a foundation for more trustworthy AI in Russian-language contexts.

Supplementary Material: zip

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 25514

Loading