Skip to main content

Research Repository

Advanced Search

Matrix Multiplication in Multiword Arithmetic: Error Analysis and Application to GPU Tensor Cores

Fasi, Massimiliano; Higham, Nicholas J.; Lopez, Florent; Mary, Theo; Mikaitis, Mantas

Authors

Nicholas J. Higham

Florent Lopez

Theo Mary

Mantas Mikaitis



Abstract

In multiword arithmetic, a matrix is represented as the unevaluated sum of two or more lower precision matrices, and a matrix product is formed by multiplying the constituents in low precision. We investigate the use of multiword arithmetic for improving the performance-accuracy tradeoff of matrix multiplication with mixed precision block fused multiply–add (FMA) hardware, focusing especially on the tensor cores available on NVIDIA GPUs. Building on a general block FMA framework, we develop a comprehensive error analysis of multiword matrix multiplication. After confirming the theoretical error bounds experimentally by simulating low precision in software, we use the cuBLAS and CUTLASS libraries to implement a number of matrix multiplication algorithms using double-fp16 (double-binary16) arithmetic. When running the algorithms on NVIDIA V100 and A100 GPUs, we find that double-fp16 is not as accurate as fp32 (binary32) arithmetic despite satisfying the same worst-case error bound. Using probabilistic error analysis, we explain why this issue is likely to be caused by the rounding mode used by the NVIDIA tensor cores, and we propose a parameterized blocked summation algorithm that alleviates the problem and significantly improves the performance-accuracy tradeoff.

Citation

Fasi, M., Higham, N. J., Lopez, F., Mary, T., & Mikaitis, M. (2023). Matrix Multiplication in Multiword Arithmetic: Error Analysis and Application to GPU Tensor Cores. SIAM Journal on Scientific Computing, 45(1), https://doi.org/10.1137/21M1465032

Journal Article Type Article
Acceptance Date Aug 24, 2022
Online Publication Date Feb 2, 2023
Publication Date Feb 2, 2023
Deposit Date Oct 14, 2022
Journal SIAM Journal on Scientific Computing
Print ISSN 1064-8275
Publisher Society for Industrial and Applied Mathematics
Volume 45
Issue 1
DOI https://doi.org/10.1137/21M1465032
Public URL https://durham-repository.worktribe.com/output/1188984
Related Public URLs http://eprints.maths.manchester.ac.uk/2862/