Abstract: Recent studies suggest that the soft-error rate in microprocessor logic is likely to become a serious reliability concern by 2010. Detecting soft errors in the processor's core logic presents a new challenge beyond what error detecting and correcting codes can handle. Commercial microprocessor systems that require an assurance of reliability employ an error-detection scheme based on dual modular redundancy (DMR) in some form - from replicated pipelines within the same die to mirroring of complete processors. To detect errors across a distributed DMR pair, we develop fingerprinting, a technique that summarizes a processor's execution history into a cryptographic signature, or "fingerprint". More specifically, a fingerprint is a hash value computed on the changes to a processor's architectural state resulting from a program's execution. Fingerprinting summarizes the history of internal processor state updates into a cryptographic signature. The processors in a dual modular redundant pair periodically exchange and compare fingerprints to corroborate each other's correctness. Relative to other techniques, fingerprinting offers superior error coverage and significantly reduces the error-detection latency and bandwidth.
0 Replies
Loading