AntiFault: A Fault-Tolerant and Self-Recoverable Floating-Point Format for Deep Neural Networks

ICLR 2026 Conference Submission19981 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Soft errors, Reliabilty, Fault tolerance, Artifical Intelligence, Approximate computing, Data Format
TL;DR: AntiFault is a 16-bit floating-point format that embeds fault tolerance without extra memory, enabling up to 50% model size reduction and protection against soft errors, while preserving model accuracy.
Abstract: Artificial Intelligence (AI) is increasingly deployed in safety-critical applications, where reliability is crucial. However, these AI-based systems are vulnerable to soft errors, where even a single bit flip in a critical model parameter can lead to complete system failure. Existing fault-tolerant solutions typically rely on hardware redundancy or additional memory overhead, which is impractical for resource-constrained edge devices. In order to address this, we introduce AntiFault, a novel 16-bit floating-point representation technique that approximates 32-bit floating-point numbers while embedding fault protection within the same bit width without incurring any additional memory overhead. AntiFault ensures minimal to no accuracy degradation and also enables multiple error detection, localization, and correction without requiring additional memory space. Our approach reduces model size by up to 50\%, while guaranteeing protection and recovery from soft errors. We have evaluated AntiFault on image and text classification tasks using ResNet18, MobileNetV2, and MobileViT on CIFAR-100 and MNIST, and DistilBERT and RoBERTa on Emotion and AG's News datasets. Experimental results show that models using AntiFault maintain their accuracy under extensive fault injection, while standard 32-bit models suffer severe degradation from even a single-bit flip in critical bits such as sign and/or exponent bits.
Supplementary Material: pdf
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 19981
Loading