Abstract: Deep neural networks (DNNs) have been widely adopted in daily life with applications ranging from face recognition to recommender systems. However, the specialized hardware used to run these systems is vulnerable to errors in computation that adversely impact accuracy. Conventional error tolerance methods cannot easily be used here due to their substantial overhead and the need to modify training algorithms to accommodate error resilience. To address this issue, this paper presents a novel approach taking advantage of the statistics of neurons’ gradients with respect to their neighbors to identify and suppress erroneous neuron values. The approach is modular and is combined with an accurate, low-overhead error detection mechanism to ensure it is used only when needed, further reducing its effective cost. Deep learning models can be trained using conventional algorithms and our error correction module is fit to a trained DNN, achieving comparable or superior performance relative to baseline error correction methods. Results are presented with emphasis on scalability with regard to dataset and network size, as well as different network architectures.
1 Reply
Loading