GPU Accelerated Image Filters using Convolution

GPU Accelerated Image Filters using Convolution

Apply filters to high-resolution images using 2D convolution on a GPU. Along the way, learn about caches and using constant, shared, and pinned memory.
GPU Accelerated Matrix Multiplication

GPU Accelerated Matrix Multiplication

Learn CUDA C/C++ basics by working on a single application: matrix multiplication. To make things interesting, let us try to match the performance of NVIDIA cuBLAS.
Programming Tensor Cores

Programming Tensor Cores

Most straightforward matrix multiplication written from scratch in CUDA C/C++ that runs on NVIDIA Tensor cores.