A comparison of beam testing and fault injection to predict soft errors
29 Jan 2024
No
- Orla Fernie

 

 

Data obtained at ChipIr on  the soft errors rate of Arm microprocessors hardware was compared to earlier estimations by microarchitectural fault injection to examine the accuracy of this method.

Yes

​​​

 

​​​Reliability in computing devices is of utmost importance. When devices are integrated into cyber physical systems like cars, airplanes and Unmanned Aerial Vehicles (UAVs), human lives are at risk, hence the need to ensure they work reliably. Leading technology provider Arm’s CPUs (Central Processing Units) are widely used in many portable user devices. Their large-scale presence is also significant, from the largest supercomputer, Fugaku in Japan to the laboratories of the US Department of Energy. More recently, Arm has been working on a line of products for autonomous vehicles. 


The soft error reliability of microprocessors can be estimated pre-silicon using early design models  and post-silicon by accelerated beam testing on manufactured chips. In this study, the team compared the FIT (Failures-In-Time) rates from beam experiments on actual chips to microarchitecture fault injection in early-stage CPU models. They used the Arm CPU cores, Cortex-A5 and Cortex-A9.The Cortext-A5 CPU is standalone, while the Cortex-A9 is embedded in a System on Chip (SoC).

Pre-silicon results were obtained using a framework which was configured to inject single-event transient faults during system simulation in multiple components, equating to more than 90% of SRAM cells inside the CPU core. At least 1000 single bit transient faults were injected on each of the target components, amounting to a total of 176,000 injections.  

For post-silicon measurements, the ChipIr instrument at ISIS was used. ChipIr delivers a neutron beam that mimics the effect of the atmospheric neutrons in electronic devices, enabling measurement of device FIT. The available neutron flux is ~8 orders of magnitude higher than the terrestrial flux.  

This study demonstrated that, under several different system setups, microarchitectural fault injection provides an accurate estimation of the data corruption FIT rate before implementing the device in silicon. The comparison between beam testing and fault injection is a significant step in the early error rate estimation. This is of interest for flexible architectures like Arm that can be tuned by the customer by adding hardened solutions before being implemented in silicon. 

This paper was awarded 2022 Best Paper Award from IEEE Transactions on Computers by the IEEE Computer Society Publications Board. Read it here: https://ieeexplore.ieee.org/document/9616430​ 

Contact: Fernie, Orla (STFC,RAL,ISIS)