Programming Tensor Cores
Goal: Code matrix multiplication from scratch that runs on NVIDIA Tensor Cores.
Content Index
- Matrix multiplication on Tensor Cores (as simple as possible)
Access the blog posts for this mini-project here.
GPU Accelerated Matrix Multiplication
Goal: Code matrix multiplication from scratch and (try to) match the performance of cuBLAS SGeMM.
Content Index
- Why care about matrix multiplication?
- What is SGeMM?
- Naive matrix multiplication on a GPU.
- Matrix multiplication on a GPU with coalesced memory accesses.
- Matrix multiplication on a GPU using shared memory.
- Matrix multiplication on a GPU using registers.
- Matrix multiplication on a GPU using even more registers.
- Matrix multiplication on a GPU with vectorized memory accesses.
Access the blog posts for this mini-project here.