## Tensor Cores

Tensor cores are dedicated accelerator units (somewhat like CUDA cores) on the NVIDIA GPUs (since Volta micro-architecture) that do just one thing: Matrix Multiplication! Let's see how we can run custom functions on Tensor Cores.

Tensor cores are dedicated accelerator units (somewhat like CUDA cores) on the NVIDIA GPUs (since Volta micro-architecture) that do just one thing: Matrix Multiplication! Let's see how we can run custom functions on Tensor Cores.

In this blog post, I first discuss how to transfer data from global memory efficiently and then show how shared memory can reduce global memory accesses and increase performance from 234 GFLOPS to 7490 GFLOPS.

In this blog post, I start with a brief discussion of the modern GPU architecture, which includes the memory hierarchy. I then spend considerable time on how the CUDA software constructs interact with the actual hardware.

In the previous blog post, I teased how GPUs can speed up matrix multiplication. However, I introduced the basics of GPU programming using a simple vector addition example, which is perfect for introducing parallel programming. In this blog post, let's perform a parallel matrix multiplication on a GPU

In this post, I explain the main difference between a CPU and a GPU. I also discuss why applications run faster on a GPU and how we can code a simple program that performs computations on a GPU.

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.

jamie@example.com

Subscribe