tilelang.contrib.cutedsl.cpasync¶
Attributes¶
Functions¶
|
|
|
|
|
extract the tensormap pointer from a TMA Copy Atom. |
|
Load data from global memory to shared memory using TMA (Tensor Memory Access). |
|
Store data from shared memory to global memory using TMA (Tensor Memory Access). |
|
Indicate arrival of warp issuing TMA_STORE. |
|
Wait for TMA_STORE operations to complete. |
|
Asynchronously copy data from global memory to shared memory. |
|
Prefetch a TMA descriptor. |
Module Contents¶
- tilelang.contrib.cutedsl.cpasync.BYTES_PER_TENSORMAP = 128¶
- tilelang.contrib.cutedsl.cpasync.BYTES_PER_POINTER = 8¶
- tilelang.contrib.cutedsl.cpasync.cp_async_gs(size, dst, dst_offset, src, src_offset)¶
- tilelang.contrib.cutedsl.cpasync.cp_async_gs_conditional(size, dst, dst_offset, src, src_offset, cond)¶
- tilelang.contrib.cutedsl.cpasync.extract_tensormap_ptr(tma_atom, *, loc=None, ip=None)¶
extract the tensormap pointer from a TMA Copy Atom. :param tma_atom: The TMA Copy Atom :type tma_atom: CopyAtom
- Parameters:
tma_atom (cutlass.cute.CopyAtom)
- Return type:
cutlass.cute.Pointer
- tilelang.contrib.cutedsl.cpasync.tma_load(tma_desc, mbar, smem_ptr, crd, *, loc=None, ip=None)¶
Load data from global memory to shared memory using TMA (Tensor Memory Access).
- Parameters:
tma_desc (CopyAtom or tensormap_ptr or Tensor of tensormap_ptr) – TMA descriptor for the tensor
mbar (Pointer) – Mbarrier pointer in shared memory
smem_ptr (Pointer) – Destination pointer in shared memory
crd (tuple[Int, ...]) – Coordinates tuple for the tensor access
- Return type:
None
- tilelang.contrib.cutedsl.cpasync.tma_store(tma_desc, smem_ptr, crd, *, loc=None, ip=None)¶
Store data from shared memory to global memory using TMA (Tensor Memory Access).
- Parameters:
tma_desc (TMA descriptor) – TMA descriptor for the tensor
smem_ptr (Pointer) – Source pointer in shared memory
crd (tuple[Int, ...]) – Coordinates tuple for the tensor access
- Return type:
None
- tilelang.contrib.cutedsl.cpasync.tma_store_arrive(*, loc=None, ip=None)¶
Indicate arrival of warp issuing TMA_STORE. Corresponds to PTX instruction: cp.async.bulk.commit_group;
- Return type:
None
- tilelang.contrib.cutedsl.cpasync.tma_store_wait(count, *, read=None, loc=None, ip=None)¶
Wait for TMA_STORE operations to complete. Corresponds to PTX instruction: cp.async.bulk.wait_group.read <count>;
- Parameters:
count (Int) – The number of outstanding bulk async groups to wait for
- Return type:
None
Asynchronously copy data from global memory to shared memory.
- Parameters:
dst (Pointer) – Destination pointer in shared memory
src (Pointer) – Source pointer in global memory
size (Int) – Size of the copy in bytes
modifier (Int) – Cache modifier
cp_size (Int) – Optional copy size override
src_size (cutlass.cute.typing.Int)
- Return type:
None
- tilelang.contrib.cutedsl.cpasync.prefetch_tma_descriptor(tma_desc, *, loc=None, ip=None)¶
Prefetch a TMA descriptor. Corresponds to PTX instruction: prefetch.tensormap;
- Return type:
None