CEDAR: Continuous Testing of Deep Learning Libraries

Danning Xie, Jiannan Wang, Hung Viet Pham, Lin Tan, Yu Guo, Adnan Aziz, Erik Meijer

Published: 2024, Last Modified: 04 Oct 2025SANER 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Since Deep Learning (DL) libraries undergo rapid development with thousands of lines of code changes daily, they require continuous testing to detect software bugs and ensure code quality. In this paper, we explore DL testing approaches in a continuous testing setting. To make it feasible, we present the first continuous testing framework for DL libraries-CEDAR-that integrates two state-of-the-art DL testing approaches (DocTer and EAGLE) efficiently to test two popular DL libraries, PyTorch and TensorFlow. Through the application of CEDAR to 20 versions of PyTorch and TensorFlow, CEDAR detects 83 bugs in 140 APIs. Out of the 83 bugs, 23 are previously unknown bugs with 21 confirmed or fixed by the developers. The results also show CEDAR has effectively shortened the bug detection latency by almost a year (338.6 days) on average. In addition, CEDAR demonstrates its effectiveness in detecting new regression bugs and masked bugs. With three optimization strategies, CEDAR reduces the time and space overhead by a factor of 15.4 and 9.7. We share insights and lessons learned from our research, aiming to advance the development of more effective and efficient continuous testing for DL libraries, benefiting both developers and researchers.