Sign in Subscribe

Tushar Gautam

Tushar Gautam

📍 United States

Mini Project: 3D Earthquake Simulation

This blog post explains how I developed a 3D earthquake wave simulation in C++. The simulation models seismic waves emanating from an earthquake, propagating through the Earth's crust, and reaching the surface.

Mini Project: GPU Accelerated Image Filters using Convolution

Apply filters to high-resolution images using 2D convolution on a GPU. Along the way, learn about caches and using constant, shared, and pinned memory.

Mini Project: GPU Accelerated Matrix Multiplication (almost) like cuBLAS

Learn CUDA C/C++ basics by working on a single application: matrix multiplication. To make things interesting, let us try to match the performance of NVIDIA cuBLAS.

Introduction to Tensor Cores Programming

Tensor cores are dedicated accelerator units (somewhat like CUDA cores) on the NVIDIA GPUs (since Volta micro-architecture) that do just one thing: Matrix Multiplication! Let's see how we can run custom functions on Tensor Cores.

Memory Coalescing and Tiled Matrix Multiplication

In this blog post, I first discuss how to transfer data from global memory efficiently and then show how shared memory can reduce global memory accesses and increase performance from 234 GFLOPS to 7490 GFLOPS.

GPU Compute and Memory Architecture

In this blog post, I start with a brief discussion of the modern GPU architecture, which includes the memory hierarchy. I then spend considerable time on how the CUDA software constructs interact with the actual hardware.

2678x Faster Matrix Multiplication with a GPU

In the previous blog post, I teased how GPUs can speed up matrix multiplication. However, I introduced the basics of GPU programming using a simple vector addition example, which is perfect for introducing parallel programming. In this blog post, let's perform a parallel matrix multiplication on a GPU

What is GPGPU Programming?

In this post, I explain the main difference between a CPU and a GPU. I also discuss why applications run faster on a GPU and how we can code a simple program that performs computations on a GPU.