28 Nov 2024 8 min read What is SGeMM SGeMM stands for Single-Precision General Matrix Multiplication. Let's analyze matrix multiplication on a CPU and a GPU.
28 Nov 2024 12 min read Step 1: Getting Started with CUDA Programming Parallel matrix multiplication using CUDA C++.
28 Nov 2024 11 min read Step 2: GPU Global Memory Coalescing Memory coalescing is the most crucial concept in GPU programming. With matrix multiplication, we can get upwards of 7x improvement.
28 Nov 2024 11 min read Step 3: GPU Shared Memory Tiled matrix multiplication using GPU shared memory.
28 Nov 2024 8 min read Step 4: 1D Thread Coarsening using GPU Registers Thread registers are used to increase the performance of matrix multiplication by another 4x.