Prioritizing Large-scale Natural Language Test Cases at OPPO

Haoran Xu, Chen Zhi, Tianyu Xiang, Zixuan Wu, Gaorong Zhang, Xinkui Zhao, Jianwei Yin, Shuiguang Deng

Published: 01 May 2025, Last Modified: 26 Jul 2025OpenReview Archive Direct UploadEveryoneRevisionsCC BY 4.0

Abstract: Regression testing is a crucial process for ensuring system stability following software updates. As a global leader in smart device manufacturing, OPPO releases a new version of its customized Android firmware, ColorOS, on a weekly basis. Testers must select test cases from a vast repository of manual test cases for regression testing. The tight schedule makes it difficult for testers to select the correct test cases from this extensive pool. Since these test cases are described in natural language, testers must manually execute them according to the operational steps, making the process labor-intensive and error-prone. Therefore, an effective test case recommendation system is needed to suggest appropriate test cases, reducing unnecessary human effort during weekly regression tests. To address these challenges, we propose a two-phase manual test case recommendation system. Our system first uses the BERT model to classify commit message, determining the most relevant test labels. Then, it employs the BGE embedding model to compute the semantic similarity between the commit message and the test cases, recommending the most suitable test cases. This approach has been practically deployed within OPPO, and feedback from several months of use shows that our test case recommendation accuracy reaches 91%. The time testers spend selecting test cases has decreased by 61%, the number of test cases executed per code change has dropped by 87%, and the defect detection rate of the recommended test cases has increased by 182.35%. Our method achieves high accuracy, low human effort, and a high defect detection rate. This paper introduces the integration of the BERT classification model and the BGE semantic similarity model in the context of manual test case recommendation, significantly improving the accuracy and efficiency of test case recommendations and providing valuable insights for regression testing in complex software systems.