Skip to content

GPU architectures are complex. This is an attempt to demystify them.

Notifications You must be signed in to change notification settings

vhxs/nvidia-gpus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 

Repository files navigation

Links

Terminology

  • CUDA is Nvidia's proprietary API to execute code on their GPUs.
  • kernels are CUDA programs that run on Nvidia GPUs.
  • CPUs are hosts, GPUs are devices.
  • Blocks contain threads. Threads in a block can have 1d, 2d, or 3d indexes.
  • Grids contain blocks. Blocks in a grid can have 1d, 2d, or 3d indexes.
  • Streaming multiprocessors (SM) are made up of cores.
  • Streaming multiprocessor execute blocks. An entire block must be run on a single SM.
  • Warps always have 32 threads. A block is executed as several warps.
  • A warp consists of lanes (term doesn't seem to be used a lot though).
  • The GPU is responsible for allocating thread blocks to SMs.
  • Nvidia has changed the definition of "core" over time for marketing purposes.

Concepts

  • Programmer writes CUDA kernel. As part of writing kernel, they specify grid and block dimensions
    • How many blocks per grid? How many threads per block?
  • The GPU will take these blocks, and allocate them across streaming multiprocessors
    • One block gets mapped to a single SM. No spreading threads in a block across SMs.
    • To execute a block, an SM will further divide threads in a block into warps. Warps are currently all of size 32 across all GPUs.
    • One thread is scheduled to run on a single core. So a warp requires 32 cores to run. Number of cores in an SM is a multiple of 32.
    • Depending on number of blocks, there could be several blocks allocated to a single SM. They are executed in sequence.
    • SM can context switch between active warps, say if a warp is waiting on a memory access to complete. It may schedule another warp that is ready to run.
    • Warp scheduling: https://www.cc.gatech.edu/fac/hyesoon/gputhread.pdf
  • How to choose grid and block dimensions https://stackoverflow.com/a/9986748
    • There are people writing PhD theses around the quantitative analysis of aspects of the problem

  • float computation is much faster on GPUs than double computation. For some reason.

GPU example

Code examples

About

GPU architectures are complex. This is an attempt to demystify them.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published