|
Cutlass
CUDA Templates for Linear Algebra Subroutines and Solvers
|
Implements matrix multiply accumulate operation of 8-bit integer data using DP4A instruction. More...
Go to the source code of this file.
Classes | |
| struct | cutlass::gemm::ThreadMultiplyAdd< ThreadGemmShape_, ThreadsPerWarp_, int8_t, int8_t, int > |
| Template performing matrix multiply-add operation within a thread. More... | |
Namespaces | |
| cutlass | |
| cutlass::gemm | |
1.8.14