Apply filters to high-resolution images using 2D convolution on a GPU. Along the way, learn about caches and using constant, shared, and pinned memory.
GPU Accelerated Matrix Multiplication
Learn CUDA C/C++ basics by working on a single application: matrix multiplication. To make things interesting, let us try to match the performance of NVIDIA cuBLAS.
Programming Tensor Cores
Most straightforward matrix multiplication written from scratch in CUDA C/C++ that runs on NVIDIA Tensor cores.