Benchmarking Band Gap Prediction for Semiconductor Materials using Multimodal and Multi-Fidelity Data

Published: 03 Mar 2025, Last Modified: 09 Apr 2025AI4MAT-ICLR-2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Submission Track: Multi-Modal Data for Materials Design - Full Paper
Submission Category: AI-Guided Design
Keywords: Band gap, graph neural network, multimodal data, multi-fidelity learning
Abstract: The band gap is critical for understanding the electronic properties of materials in semiconductor applications. While density functional theory is commonly used to estimate band gaps, it often underestimates values and remains computationally expensive, limiting its practical usefulness. Machine learning (ML) has become a promising alternative for accurate and efficient band gap predictions. However, existing datasets are limited in data modality, fidelity and sample size, and performance evaluation studies often lack direct comparisons between traditional and advanced ML models. Therefore, a more comprehensive evaluation is needed to make progress towards real-world impacts. In this paper, we developed a benchmarking framework for ML-based band gap prediction to address this gap. We compiled a new multimodal, multi-fidelity dataset from the Materials Project and BandgapDatabase1, consisting of 60,218 low-fidelity computational band gaps and 1,183 high-fidelity experimental band gaps across 10 material categories. We evaluated seven ML models, from traditional methods to graph neural networks, assessing their ability to learn from atomic properties and structural information. To promote real-world applicability, we employed three metrics: mean absolute error, mean relative absolute error, and coefficient of determination $R^2$. Moreover, we introduced a leave-one-materialout evaluation strategy to better reflect real-world scenarios where new materials have little to no prior training data. Our findings offer valuable insights into model selection and evaluation for band gap prediction across material categories, providing guidance for real-world applications in materials discovery and semiconductor design. The data and code used in this work are available at: https://github.com/Shef-AIRE/bandgap-benchmark.
Submission Number: 51
Loading