Doubt and Redundancy Kill Soft Errors---Towards Detection and Correction of Silent Data Corruption in Task-based Numerical Software
(2021)
Presentation / Conference Contribution
Samfass, P., Weinzierl, T., Reinarz, A., & Bader, M. (2021, November). Doubt and Redundancy Kill Soft Errors---Towards Detection and Correction of Silent Data Corruption in Task-based Numerical Software. Presented at Supercomputing 21 - FTXS Workshop - 2021 IEEE/ACM 11th Workshop on Fault Tolerance for HPC at eXtreme Scale (FTXS), St Louis, MO
Resilient algorithms in high-performance computing are subject to rigorous non-functional constraints. Resiliency must not increase the runtime, memory footprint or I/O demands too significantly. We propose a task-based soft error detection scheme th... Read More about Doubt and Redundancy Kill Soft Errors---Towards Detection and Correction of Silent Data Corruption in Task-based Numerical Software.