Mini Projects

Mini Projects

Photo by Hunter Haley / Unsplash

Programming Tensor Cores

GitHub - tgautam03/tGeMM: General Matrix Multiplication using NVIDIA Tensor Cores
General Matrix Multiplication using NVIDIA Tensor Cores - tgautam03/tGeMM

Goal: Code matrix multiplication from scratch that runs on NVIDIA Tensor Cores.

Content Index

  1. Matrix multiplication on Tensor Cores (as simple as possible)

Access the blog posts for this mini-project here.

GPU Accelerated Matrix Multiplication

GitHub - tgautam03/xGeMM: Accelerated General (FP32) Matrix Multiplication
Accelerated General (FP32) Matrix Multiplication. Contribute to tgautam03/xGeMM development by creating an account on GitHub.

Goal: Code matrix multiplication from scratch and (try to) match the performance of cuBLAS SGeMM.

Content Index

  1. Why care about matrix multiplication?
  2. What is SGeMM?
  3. Naive matrix multiplication on a GPU.
  4. Matrix multiplication on a GPU with coalesced memory accesses.
  5. Matrix multiplication on a GPU using shared memory.
  6. Matrix multiplication on a GPU using registers.
  7. Matrix multiplication on a GPU using even more registers.
  8. Matrix multiplication on a GPU with vectorized memory accesses.

Access the blog posts for this mini-project here.