Matrix Multiplication in Multiword Arithmetic: Error Analysis and Application to GPU Tensor Cores

Fasi, Massimiliano; Higham, Nicholas J.; Lopez, Florent; Mary, Theo; Mikaitis, Mantas

doi:10.1137/21M1465032

Matrix Multiplication in Multiword Arithmetic: Error Analysis and Application to GPU Tensor Cores

Fasi, Massimiliano; Higham, Nicholas J.; Lopez, Florent; Mary, Theo; Mikaitis, Mantas

Authors

Dr Massimiliano Fasi massimiliano.fasi@durham.ac.uk
Assistant Professor

Nicholas J. Higham

Florent Lopez

Theo Mary

Mantas Mikaitis

Abstract

In multiword arithmetic, a matrix is represented as the unevaluated sum of two or more lower precision matrices, and a matrix product is formed by multiplying the constituents in low precision. We investigate the use of multiword arithmetic for improving the performance-accuracy tradeoff of matrix multiplication with mixed precision block fused multiply–add (FMA) hardware, focusing especially on the tensor cores available on NVIDIA GPUs. Building on a general block FMA framework, we develop a comprehensive error analysis of multiword matrix multiplication. After confirming the theoretical error bounds experimentally by simulating low precision in software, we use the cuBLAS and CUTLASS libraries to implement a number of matrix multiplication algorithms using double-fp16 (double-binary16) arithmetic. When running the algorithms on NVIDIA V100 and A100 GPUs, we find that double-fp16 is not as accurate as fp32 (binary32) arithmetic despite satisfying the same worst-case error bound. Using probabilistic error analysis, we explain why this issue is likely to be caused by the rounding mode used by the NVIDIA tensor cores, and we propose a parameterized blocked summation algorithm that alleviates the problem and significantly improves the performance-accuracy tradeoff.

Citation

Fasi, M., Higham, N. J., Lopez, F., Mary, T., & Mikaitis, M. (2023). Matrix Multiplication in Multiword Arithmetic: Error Analysis and Application to GPU Tensor Cores. SIAM Journal on Scientific Computing, 45(1), https://doi.org/10.1137/21M1465032

Journal Article Type	Article
Acceptance Date	Aug 24, 2022
Online Publication Date	Feb 2, 2023
Publication Date	Feb 2, 2023
Deposit Date	Oct 14, 2022
Journal	SIAM Journal on Scientific Computing
Print ISSN	1064-8275
Publisher	Society for Industrial and Applied Mathematics
Volume	45
Issue	1
DOI	https://doi.org/10.1137/21M1465032
Public URL	https://durham-repository.worktribe.com/output/1188984
Related Public URLs	http://eprints.maths.manchester.ac.uk/2862/