48 typename ThreadGemmShape_,
58 int kScalarsPerLdgA_ = 1,
60 int kScalarsPerLdgB_ = 1>
73 ThreadMultiplyAdd<ThreadGemmShape_, Shape<1, 4, 8>, ScalarA_, ScalarB_, float >,
105 typename ScalarA_ = half,
107 typename ScalarB_ = half,
109 typename ScalarC_ = half,
111 typename ScalarD_ = half,
113 typename Scalar_ = half,
119 int kScalarsPerLdgA_ = 1,
121 int kScalarsPerLdgB_ = 1,
123 typename Index_ = int,
125 typename GemmConfig_ =
135 typename GemmEpilogueTraits_ =
145 GemmEpilogue<GemmEpilogueTraits_>,
Defines iterators for efficiently loading and storing to global memory.
Defines structural properties of complete GEMM computation.
Kind
Enumeration defining fundamental contiguous layouts.
Definition: matrix_traits.h:159
Implements the epilogue phase of the GEMM kernel that efficiently updates global memory with the comp...
Defines iterators for efficiently loading and storing tiles to and from shared memory.
Definition: gemm_config.h:76
A Shape implementing Layout Concept describing the dimensions of a cube.
Definition: shape.h:64
Definition: gemm_epilogue_traits.h:340
Definition: fp16_sgemm_traits.h:61
Template implementing matrix multiply-add operations on fragments.
Functor to compute linear combination of fragments.
Definition: linear_scaling.h:51
Implements a software-pipelined efficient GEMM.
Defines structural properties of the GEMM epilogue.
Definition: fp16_sgemm_traits.h:137
Definition: gemm_traits.h:773
Definition: fragment_multiply_add.h:41