|
Cutlass
CUDA Templates for Linear Algebra Subroutines and Solvers
|
Collect the global load streams for multiplicands.
#include <gemm_stream_pair.h>
Classes | |
| struct | Params |
| Parameters object passed to load iterators. More... | |
Public Types | |
| typedef StreamA_ | StreamA |
| Stream for A multiplicand. More... | |
| typedef StreamB_ | StreamB |
| Stream for B multiplicand. More... | |
| typedef ZipTensorRef< typename StreamA::TensorRef, typename StreamB::TensorRef > | ThreadblockTileRef |
| Shared memory allocation for threadblock-scoped GEMM tile. More... | |
Public Member Functions | |
| CUTLASS_DEVICE | SharedStreamPair (Params const ¶ms, ThreadblockTileRef const &threadblock_tile_ref) |
| Construct with the composable structure. More... | |
| CUTLASS_DEVICE void | copy (int step) |
| Trigger the copies from shared memory to registers. More... | |
| CUTLASS_DEVICE void | commit (int step) |
| Commit the data. More... | |
| CUTLASS_DEVICE StreamA::TransformedFragment const & | fragment_a (int step) const |
| The fragment A. More... | |
| CUTLASS_DEVICE StreamB::TransformedFragment const & | fragment_b (int step) const |
| The fragment B. More... | |
| CUTLASS_DEVICE void | inc_stage () |
| Increment the stage. More... | |
Public Attributes | |
| StreamA | stream_a |
| The stream for A. More... | |
| StreamB | stream_b |
| The stream for B. More... | |
| typedef StreamA_ cutlass::gemm::SharedStreamPair< StreamA_, StreamB_ >::StreamA |
| typedef StreamB_ cutlass::gemm::SharedStreamPair< StreamA_, StreamB_ >::StreamB |
| typedef ZipTensorRef<typename StreamA::TensorRef, typename StreamB::TensorRef > cutlass::gemm::SharedStreamPair< StreamA_, StreamB_ >::ThreadblockTileRef |
|
inline |
|
inline |
|
inline |
|
inline |
|
inline |
|
inline |
| StreamA cutlass::gemm::SharedStreamPair< StreamA_, StreamB_ >::stream_a |
| StreamB cutlass::gemm::SharedStreamPair< StreamA_, StreamB_ >::stream_b |
1.8.14