Cutlass
CUDA Templates for Linear Algebra Subroutines and Solvers
cutlass Directory Reference

Directories

directory  gemm
 
directory  reduction
 
directory  util
 

Files

file  convert.h [code]
 Defines conversion operations among Fragments of different base type.
 
file  coord.h [code]
 A Coord is a coordinate of arbitrary rank into a tensor or matrix.
 
file  core_io.h [code]
 Helpers for printing cutlass/core objects.
 
file  cutlass.h [code]
 Basic include for CUTLASS macros.
 
file  fragment.h [code]
 Defines Fragment, a statically-sized array for storing parts of matrices within a thread's registers.
 
file  fragment_multiply_add.h [code]
 Defines multiply-add operations on fragments within a thread.
 
file  iterator_access.h [code]
 Free functions for loading and storing to implementations of tile iteartor concepts.
 
file  kernel_launch.h [code]
 Defines structures and helpers to launch CUDA kernels within CUTLASS.
 
file  load_store.h [code]
 Defines abstractions for efficiently loading and storing vectors to memory.
 
file  matrix_traits.h [code]
 Defines properties of matrices used to denote layout and operands to GEMM kernels.
 
file  predicate_vector.h [code]
 Defines container classes and iterators for managing a statically sized vector of boolean predicates.
 
file  reshape_tile.h [code]
 Defines a type for restructuring a tile.
 
file  shape.h [code]
 Defines Shape implementing the Layout concept for representing a 4D hypercube of objects.
 
file  tensor_ref.h [code]
 Defines a structure containing strides, bounds, and a pointer to tensor data.
 
file  tensor_ref_collection.h [code]
 Introduces TensorRefCollection concept and defines TensorRefBatch and TensorRefArray.
 
file  tensor_view.h [code]
 Defines a structure containing strides and a pointer to tensor data.
 
file  tile_allocation.h [code]
 Defines a fragment based on a Shape<> template.
 
file  tile_coord.h [code]
 Defines a coordinate used for the CUTLASS 4-D tile structure.
 
file  tile_iterator.h [code]
 Defines the Tile Traits concept and iterators for loading and storing to tiles efficiently.
 
file  tile_stream.h [code]
 Implements the tile stream concept, composing an iterator with a transformation. Offers split-phase semantics, separating the initiation of an asynchronous memory operation with a fence forcing it to complete.
 
file  tile_traits_standard.h [code]
 Defines tile traits for several tile partitioning arrangements of threads expected to achieve efficient streaming performance.
 
file  vector.h [code]
 Defines a 1D vector of elements held in the registers of each thread.
 
file  wmma_matrix.h [code]
 Abstractions for loading and storing matrices using the CUDA WMMA API.
 
file  zip_fragment.h [code]
 Models a pair of fragments.
 
file  zip_tensor_ref.h [code]
 Defines a structure containing a pair of TensorRef-like objects.
 
file  zip_tile_iterator.h [code]
 Constructs an iterator that owns two tile iterator instances.