tilelang.contrib.cutedsl.ldsm¶
LDMATRIX and STMATRIX operations for CuTeDSL backend. Based on tl_templates/cuda/ldsm.h
These functions provide wrappers around PTX ldmatrix/stmatrix instructions for loading/storing 8x8 matrix fragments between shared memory and registers.
Functions¶
|
Load 1 matrix (8x8) from shared memory |
|
Load 2 matrices (8x8 each) from shared memory |
|
Load 4 matrices (8x8 each) from shared memory |
|
Load 1 matrix (8x8) with transpose from shared memory |
|
Load 2 matrices (8x8 each) with transpose from shared memory |
|
Load 4 matrices (8x8 each) with transpose from shared memory |
|
Store 1 matrix (8x8) to shared memory |
|
Store 2 matrices (8x8 each) to shared memory |
|
Store 4 matrices (8x8 each) to shared memory |
|
Store 1 matrix (8x8) with transpose to shared memory |
|
Store 2 matrices (8x8 each) with transpose to shared memory |
|
Store 4 matrices (8x8 each) with transpose to shared memory |
Module Contents¶
- tilelang.contrib.cutedsl.ldsm.ptx_ldmatrix_x1(smem_ptr, local_ptr, *, loc=None, ip=None)¶
Load 1 matrix (8x8) from shared memory
- Parameters:
smem_ptr (cutlass.cute.typing.Pointer)
local_ptr (cutlass.cute.typing.Pointer)
- Return type:
None
- tilelang.contrib.cutedsl.ldsm.ptx_ldmatrix_x2(smem_ptr, local_ptr, *, loc=None, ip=None)¶
Load 2 matrices (8x8 each) from shared memory
- Parameters:
smem_ptr (cutlass.cute.typing.Pointer)
local_ptr (cutlass.cute.typing.Pointer)
- Return type:
None
- tilelang.contrib.cutedsl.ldsm.ptx_ldmatrix_x4(smem_ptr, local_ptr, *, loc=None, ip=None)¶
Load 4 matrices (8x8 each) from shared memory
- Parameters:
smem_ptr (cutlass.cute.typing.Pointer)
local_ptr (cutlass.cute.typing.Pointer)
- Return type:
None
- tilelang.contrib.cutedsl.ldsm.ptx_ldmatrix_x1_trans(smem_ptr, local_ptr, *, loc=None, ip=None)¶
Load 1 matrix (8x8) with transpose from shared memory
- Parameters:
smem_ptr (cutlass.cute.typing.Pointer)
local_ptr (cutlass.cute.typing.Pointer)
- Return type:
None
- tilelang.contrib.cutedsl.ldsm.ptx_ldmatrix_x2_trans(smem_ptr, local_ptr, *, loc=None, ip=None)¶
Load 2 matrices (8x8 each) with transpose from shared memory
- Parameters:
smem_ptr (cutlass.cute.typing.Pointer)
local_ptr (cutlass.cute.typing.Pointer)
- Return type:
None
- tilelang.contrib.cutedsl.ldsm.ptx_ldmatrix_x4_trans(smem_ptr, local_ptr, *, loc=None, ip=None)¶
Load 4 matrices (8x8 each) with transpose from shared memory
- Parameters:
smem_ptr (cutlass.cute.typing.Pointer)
local_ptr (cutlass.cute.typing.Pointer)
- Return type:
None
- tilelang.contrib.cutedsl.ldsm.ptx_stmatrix_x1(smem_ptr, value0, *, loc=None, ip=None)¶
Store 1 matrix (8x8) to shared memory
- Parameters:
smem_ptr (cutlass.cute.typing.Pointer)
- Return type:
None
- tilelang.contrib.cutedsl.ldsm.ptx_stmatrix_x2(smem_ptr, value0, value1, *, loc=None, ip=None)¶
Store 2 matrices (8x8 each) to shared memory
- Parameters:
smem_ptr (cutlass.cute.typing.Pointer)
- Return type:
None
- tilelang.contrib.cutedsl.ldsm.ptx_stmatrix_x4(smem_ptr, value0, value1, value2, value3, *, loc=None, ip=None)¶
Store 4 matrices (8x8 each) to shared memory
- Parameters:
smem_ptr (cutlass.cute.typing.Pointer)
- Return type:
None
- tilelang.contrib.cutedsl.ldsm.ptx_stmatrix_x1_trans(smem_ptr, value0, *, loc=None, ip=None)¶
Store 1 matrix (8x8) with transpose to shared memory
- Parameters:
smem_ptr (cutlass.cute.typing.Pointer)
- Return type:
None
- tilelang.contrib.cutedsl.ldsm.ptx_stmatrix_x2_trans(smem_ptr, value0, value1, *, loc=None, ip=None)¶
Store 2 matrices (8x8 each) with transpose to shared memory
- Parameters:
smem_ptr (cutlass.cute.typing.Pointer)
- Return type:
None
- tilelang.contrib.cutedsl.ldsm.ptx_stmatrix_x4_trans(smem_ptr, value0, value1, value2, value3, *, loc=None, ip=None)¶
Store 4 matrices (8x8 each) with transpose to shared memory
- Parameters:
smem_ptr (cutlass.cute.typing.Pointer)
- Return type:
None