Sign in Subscribe

Topic

GPU Programming

GPGPU programming using CUDA

Tensor Cores

Tensor cores are dedicated accelerator units (somewhat like CUDA cores) on the NVIDIA GPUs (since Volta micro-architecture) that do just one thing: Matrix Multiplication! Let's see how we can run custom functions on Tensor Cores.

Memory Coalescing and Tiled Matrix Multiplication

In this blog post, I first discuss how to transfer data from global memory efficiently and then show how shared memory can reduce global memory accesses and increase performance from 234 GFLOPS to 7490 GFLOPS.

GPU Compute and Memory Architecture

In this blog post, I start with a brief discussion of the modern GPU architecture, which includes the memory hierarchy. I then spend considerable time on how the CUDA software constructs interact with the actual hardware.

2678x Faster Matrix Multiplication with a GPU

In the previous blog post, I teased how GPUs can speed up matrix multiplication. However, I introduced the basics of GPU programming using a simple vector addition example, which is perfect for introducing parallel programming. In this blog post, let's perform a parallel matrix multiplication on a GPU

What is GPGPU Programming?

In this post, I explain the main difference between a CPU and a GPU. I also discuss why applications run faster on a GPU and how we can code a simple program that performs computations on a GPU.