MetaGAI: A Large-Scale and High-Quality Benchmark for Generative AI Model and Data Card Generation

MetaGAI: A Large-Scale and High-Quality Benchmark for Generative AI Model and Data Card Generation

ACL ARR 2026 January Submission1867 Authors

31 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Model Card, Data Card, Knowledge Extraction, Generative AI, LLMs, AI Agent

Abstract: The rapid proliferation of Generative AI necessitates rigorous documentation standards for transparency and governance. However, manual creation of Model and Data Cards is not scalable, while automated approaches lack large-scale, high-fidelity benchmarks for systematic evaluation. We introduce MetaGAI, a comprehensive benchmark comprising 2,541 verified document triplets constructed through semantic triangulation of academic papers, GitHub repositories, and Hugging Face artifacts. Unlike prior single-source datasets, MetaGAI employs a multi-agent framework with specialized Retriever, Generator, and Editor agents, validated through four-dimensional human-in-the-loop assessment. We establish a robust evaluation protocol combining automated metrics with validated LLM-as-a-Judge frameworks. Extensive analysis reveals that sparse Mixture-of-Experts architectures achieve superior cost-quality efficiency, while a fundamental trade-off exists between faithfulness and completeness. MetaGAI provides a foundational testbed for benchmarking, training, and analyzing automated Model and Data Card generation methods at scale. Our data and code are available at https://anonymous.4open.science/r/MetaGAI-DBB4.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: benchmarking, automatic creation and evaluation of language resources, NLP datasets, automatic evaluation of datasets, reproducibility

Contribution Types: Data resources

Languages Studied: English

Submission Number: 1867

Loading