Abstract: Although Differentiable Architecture Search (DARTS) has achieved promising performance in many machine learning tasks, it still suffers from a problem during searching: due to those different operations in candidate set may need different levels of optimization, directly handling them with the same training scheme will make DARTS in favor of networks with fast convergence, resulting in a performance drop correspondingly. This problem will become more serious at the later searching stages. In this paper, we propose an adaptive dropout method for DARTS (AD-DARTS), which zeros the output of each operation with a probability according to structure parameters which can be considered as the variable representing the difficulty-level training such a candidate operation, thus serving to balance the training procedures for different operations. The operations with more parameters can be trained more adequately to strengthen the characterization ability of the network. Our analysis further shows that the proposed AD-DARTS are also with high search stability. The proposed method effectively solves the aforementioned problem and can achieve better performance compared with other baselines based on DRATS on CIFAR-10, CIFAR-100, and ImageNet.
0 Replies
Loading