|
Cutlass
CUDA Templates for Linear Algebra Subroutines and Solvers
|
Files | |
| file | batched_reduction.h [code] |
| Implements a software-pipelined efficient batched reduction. D = alpha * Reduction(A) + beta * C. | |
| file | batched_reduction_traits.h [code] |
| Defines structural properties of complete batched reduction. D = alpha * Reduction(A) + beta * C. | |
| file | reduction/threadblock_swizzle.h [code] |
| Defies functors for mapping blockIdx to partitions of the batched reduction computation. | |
1.8.14