Programming Tensor Cores

Most straightforward matrix multiplication written from scratch in CUDA C/C++ that runs on NVIDIA Tensor cores.

Tensor Cores

Tensor cores are dedicated accelerator units (somewhat like CUDA cores) on the NVIDIA GPUs (since Volta micro-architecture) that do just one thing: Matrix Multiplication! Let's see how we can run custom functions on Tensor Cores.