GPU Accelerated Matrix Multiplication

GPU Accelerated Matrix Multiplication

Accelerated matrix multiplication using CUDA C/C++. To make things interesting, let us try to match the performance of NVIDIA cuBLAS.
Programming Tensor Cores

Programming Tensor Cores

Most straightforward matrix multiplication written from scratch in CUDA C/C++ that runs on NVIDIA Tensor cores.