49 typename MultiplyAdd_,
63 int kScalarsPerLdgCAndStgD_,
71 bool kResidueSeparate_ =
false,
73 bool kResidueInProlog_ =
false,
75 bool kLaunchBounds_ =
true>
static int const kThreads
The numnber of threads.
Definition: gemm_config.h:103
ShapeDiv< OutputTile, AccumulatorsPerWarp >::Shape Warps
The number of warps.
Definition: gemm_config.h:99
MultiplyAdd::InstructionShape InstructionShape
The shape of the instruction.
Definition: gemm_config.h:92
static int const kWarpSize
The default warp size (32 threads per warp).
Definition: gemm_config.h:101
static int const kScalarsPerLdsD
Definition: gemm_config.h:121
static int const kScalarsPerStgD
The number of scalars per STS/LDS/STG for D.
Definition: gemm_config.h:119
A template defining Fragment Concept.
Definition: fragment.h:99
static int const kScalarsPerLdgB
The number of scalars per LDG/STS/LDS for B.
Definition: gemm_config.h:111
ScalarC_ ScalarC
The scalar for C.
Definition: gemm_config.h:83
MultiplyAdd::Accumulators Accumulators
The accumulators.
Definition: gemm_config.h:96
static int const kStages
The number of stages in shared memory to implement double, triple, more-buffering.
Definition: gemm_config.h:128
ShapeMul< ThreadGemmShape, ThreadsPerWarp >::Shape AccumulatorsPerWarp
The number of accumulators per warp.
Definition: thread_multiply_add.h:54
static bool const kResidueInProlog
If true, residue is computed in the prologue.
Definition: gemm_config.h:136
static bool const kLaunchBounds
If true, kernel is launched with launch bounds specified.
Definition: gemm_config.h:139
MultiplyAdd_ MultiplyAdd
The functor to do D = A*B + C.
Definition: gemm_config.h:90
static int const kAccumulatorsPerLdsB
Definition: gemm_config.h:125
Shape< A_::kD/B_::kD, A_::kH/B_::kH, A_::kW/B_::kW, A_::kC/B_::kC > Shape
Definition: shape.h:126
ScalarA_ ScalarA
The scalar for A.
Definition: gemm_config.h:79
static bool const kResidueSeparate
If true, mainloop is instantiated twice. The first instantiation contains no predicate.
Definition: gemm_config.h:133
Definition: gemm_config.h:76
MultiplyAdd::AccumulatorsPerWarp AccumulatorsPerWarp
The shape of warp-level GEMM.
Definition: gemm_config.h:94
static int const kScalarsPerLdsB
Definition: gemm_config.h:113
static int const kScalarsPerLdgC
The number of scalars per LDG for C.
Definition: gemm_config.h:116
A Shape implementing Layout Concept describing the dimensions of a cube.
Definition: shape.h:64
static int const kScalarsPerLdsA
Definition: gemm_config.h:108
static int const kAccumulatorsPerLdsA
The number of accumulators that are going to be fed from one LDS A/B.
Definition: gemm_config.h:124
static int const kScalarsPerStsA
Definition: gemm_config.h:107
static int const kScalarsPerStsB
Definition: gemm_config.h:112
static int const kScalarsPerLdgA
The number of scalars per LDG/STS/LDS for A.
Definition: gemm_config.h:106
static int const kScalarsPerStsD
Definition: gemm_config.h:120
ScalarD_ ScalarD
The scalar for D.
Definition: gemm_config.h:85
Defines Shape implementing the Layout concept for representing a 4D hypercube of objects.
Compute derived counted of a Layout Concept based class.
Definition: shape.h:79
ScalarB_ ScalarB
The scalar for B.
Definition: gemm_config.h:81
OutputTile_ OutputTile
The tile.
Definition: gemm_config.h:88