Adaptive Random Testing of Deep Learning Systems Using Image Hashing

Published: 2025, Last Modified: 15 Jan 2026IEEE Trans. Reliab. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In recent years, deep learning (DL) systems have been applied in many areas, including image processing and autonomous driving. Software testing is an important way to ensure the quality of software. Among various testing methods, random testing (RT) has been widely used for DL systems, due to its simplicity and efficiency. However, it has been criticized for its poor fault-detection effectiveness. As an enhancement of RT, adaptive random testing (ART) attempts to evenly spread test cases over the input domain, aiming to improve the distribution diversity. However, current ART methods for DL systems have low testing efficiency, particularly for image-based DL systems. This is because of the current reliance on visual geometry group network-16 (VGGNet-16) to extract image features to represent image inputs—VGGNet-16 is a 16-layer, deep convolutional neural network that has been widely used for image classification and feature extraction. Feature extraction with VGGNet-16 is very time-consuming, with each image being extracted as a high-dimensional vector. The (dis)similarity calculations for images with such high-dimensional vectors incur heavy computational overheads. To overcome these challenges, we propose a new ART approach: image-hashing-based ART (IHART). IHART uses image hashing to quickly extract features from each image, storing them as a low-dimensional binary vector. This can significantly reduce the computational costs for dissimilarity calculations during test-case generation. We report on a series of experiments, using several well-known datasets and DL systems, to evaluate the IHART performance. Our results show that, of the three mainstream image-hashing strategies studied, perceptual hashing delivers the best ART test-case generation performance—perceptual hashing, which is used in image deduplication and content searching, uses content features in the hashing process. Compared with current approaches, IHART performs very well in fault-detection effectiveness across most datasets and models, and significantly better fault-detection efficiency.
Loading