tilelang.tileop.gemm.gemm_cutedsl¶
GEMM implementation for CuTeDSL backend - directly calls tl::gemm intrinsic.
Classes¶
GEMM implementation for CuTeDSL that directly calls tl::gemm intrinsic. |
Module Contents¶
- class tilelang.tileop.gemm.gemm_cutedsl.GemmCuTeDSL¶
Bases:
tilelang.tileop.gemm.gemm_base.GemmBaseGEMM implementation for CuTeDSL that directly calls tl::gemm intrinsic.
This implementation bypasses the complex lowering logic of MMA/WGMMA and directly emits a call to tl::gemm, similar to gemm_v1 behavior. This is necessary for CuTeDSL backend which requires simpler IR.
- infer_layout(target, thread_nums)¶
For CuTeDSL, we still need proper layout inference for A, B, C buffers.
CuTeDSL uses the same underlying hardware instructions (WGMMA/MMA), so it needs the same layout information. We delegate to the appropriate implementation based on the instruction type.
- Parameters:
target (tvm.target.Target)
thread_nums (int)
- lower(layout_map, target, thread_nums, thread_var)¶
Lower to a direct gemm_v1 call without complex MMA/WGMMA lowering.
- Parameters:
layout_map (dict)
target (tvm.target.Target)
thread_nums (int)
thread_var (tvm.tir.Var)