tilelang.tileop.gemm.gemm_cutedsl¶

GEMM implementation for CuTeDSL backend - directly calls tl::gemm intrinsic.

Classes¶

GemmCuTeDSL

GEMM implementation for CuTeDSL that directly calls tl::gemm intrinsic.

Module Contents¶

class tilelang.tileop.gemm.gemm_cutedsl.GemmCuTeDSL¶

Bases: tilelang.tileop.gemm.gemm_base.GemmBase

GEMM implementation for CuTeDSL that directly calls tl::gemm intrinsic.

This implementation bypasses the complex lowering logic of MMA/WGMMA and directly emits a call to tl::gemm, similar to gemm_v1 behavior. This is necessary for CuTeDSL backend which requires simpler IR.

infer_layout(target, thread_nums)¶

For CuTeDSL, we still need proper layout inference for A, B, C buffers.

CuTeDSL uses the same underlying hardware instructions (WGMMA/MMA), so it needs the same layout information. We delegate to the appropriate implementation based on the instruction type.

Parameters:

target (tvm.target.Target)
thread_nums (int)

lower(layout_map, target, thread_bounds, thread_var)¶

Lower to a direct gemm_v1 call without complex MMA/WGMMA lowering.

Parameters:

layout_map (dict)
target (tvm.target.Target)
thread_bounds (tvm.ir.Range)
thread_var (tvm.tir.Var)