Keywords: Hyperparameters, Meta Data, Model Training, AutoML, Fine tuning, Machine Learning
Abstract: Hyperparameter selection in machine learning remains a critical challenge often involving tedious trial-and-error or costly optimization processes. This paper presents “MetaBench-7”,a comprehensive metadata dataset that includes seven modalities (Image, Text, Tabular, Graph, Time-Series, Audio, and Video) with their optimal hyperparameter configurations.The collection includes 573 distinct models, each with standardized metadata like dataset size and class count,enabling detailed quantitative investigation of design trends Exploratory Data Analysis shows unique patterns for each modality.Text datasets often use Transformer-based models with large batch sizes and relatively few epochs,while Graph and Tabular datasets use larger batch sizes and more epochs as dataset size increases. The model-modality specialization statistics indicate that certain architectures, like ResNet50 and XGBoost, are specialized for specific modalities, whereas Transformer variants can work across multiple modalities The dataset provides “safe default” hyperparameter configurations tailored to each modality, offering reliable baselines for new datasets.The dataset serves as a valuable, reusable resource for meta-learning, AutoML and research into the hyperparameter performance dynamics.
Paper Type: Short
Research Area: Information Extraction and Retrieval
Research Area Keywords: AutoML, Machine Learning, Information Extraction and Retrieval, AI/LLM Agents
Contribution Types: Model analysis & interpretability, Publicly available software and/or pre-trained models, Data resources, Data analysis
Languages Studied: English
Submission Number: 10941
Loading