Cutlass
CUDA Templates for Linear Algebra Subroutines and Solvers
Cutlass Documentation