tilelang.language.gemm_op¶

GEMM (General Matrix Multiplication) operators exposed on the TileLang language surface.

Functions¶

gemm_v1(A, B, C[, transpose_A, transpose_B, policy, ...])

Synchronous GEMM v1: use op tl.gemm.

gemm_v2(A, B, C[, transpose_A, transpose_B, policy, ...])

Synchronous GEMM v2: use op tl.gemm_py.

gemm(A, B, C[, transpose_A, transpose_B, policy, ...])

TileLang GEMM operator.

wgmma_gemm(A, B, C[, transpose_A, transpose_B, ...])

Explicit Hopper WGMMA GEMM without an implicit wait.

tcgen05_gemm(A, B, C[, transpose_A, transpose_B, ...])

Explicit Blackwell TCGEN05 GEMM without an implicit wait.

Module Contents¶

tilelang.language.gemm_op.gemm_v1(A, B, C, transpose_A=False, transpose_B=False, policy=GemmWarpPolicy.Square, clear_accum=False, k_pack=1, mbar=None)¶

Synchronous GEMM v1: use op tl.gemm.

Parameters:
  • A (tilelang._typing.BufferLikeType)

  • B (tilelang._typing.BufferLikeType)

  • C (tilelang._typing.BufferLikeType)

  • transpose_A (bool)

  • transpose_B (bool)

  • policy (tilelang.tileop.base.GemmWarpPolicy)

  • clear_accum (bool)

  • k_pack (int)

  • mbar (tilelang._typing.BarrierType | None)

Return type:

tvm.tir.PrimExpr

tilelang.language.gemm_op.gemm_v2(A, B, C, transpose_A=False, transpose_B=False, policy=GemmWarpPolicy.Square, clear_accum=False, k_pack=1, mbar=None)¶

Synchronous GEMM v2: use op tl.gemm_py.

Parameters:
  • A (tilelang._typing.BufferLikeType)

  • B (tilelang._typing.BufferLikeType)

  • C (tilelang._typing.BufferLikeType)

  • transpose_A (bool)

  • transpose_B (bool)

  • policy (tilelang.tileop.base.GemmWarpPolicy)

  • clear_accum (bool)

  • k_pack (int)

  • mbar (tilelang._typing.BarrierType | None)

Return type:

tvm.tir.PrimExpr

tilelang.language.gemm_op.gemm(A, B, C, transpose_A=False, transpose_B=False, policy=GemmWarpPolicy.Square, clear_accum=False, k_pack=1, mbar=None)¶

TileLang GEMM operator.

This is the default synchronous GEMM interface. On Hopper, if the compiler selects WGMMA lowering, TileLang inserts the corresponding wait implicitly. On Blackwell TCGEN5MMA, TileLang inserts the corresponding mbarrier_wait_parity(…) implicitly after issue.

For manual asynchronous scheduling, use T.wgmma_gemm(…) with T.wait_wgmma(…) on Hopper, or T.tcgen05_gemm(…) with T.mbarrier_wait_parity(…) on Blackwell.

Parameters:
  • A (BufferLikeType, i.e. Buffer | BufferLoad | BufferRegion, or Var) – Input buffer A.

  • B (BufferLikeType) – Input buffer B.

  • C (BufferLikeType) – Output buffer C.

  • transpose_A (bool) – Whether to transpose A. Defaults to False.

  • transpose_B (bool) – Whether to transpose B. Defaults to False.

  • policy (GemmWarpPolicy) – GEMM warp partition policy.

  • clear_accum (bool) – Whether to clear the accumulator.

  • k_pack (int) – Numbers of packed matrix cores, for ROCm only. Defaults to 1.

  • mbar (BarrierType, i.e. Buffer | BufferLoad, or Var, optional) – Mbarrier in Blackwell. Required when this GEMM lowers to TCGEN5MMA. Defaults to None.

Returns:

A handle to the GEMM operation.

Return type:

tir.Call

tilelang.language.gemm_op.wgmma_gemm(A, B, C, transpose_A=False, transpose_B=False, policy=GemmWarpPolicy.Square, clear_accum=False)¶

Explicit Hopper WGMMA GEMM without an implicit wait.

This is the explicit asynchronous Hopper WGMMA counterpart to the default synchronous T.gemm(…) interface, with two stricter guarantees: - it always requests the WGMMA lowering path - it never auto-emits an inlined warpgroup_wait

If the current target or operand pattern cannot use Hopper WGMMA, compilation fails instead of silently falling back to MMA.

Parameters:
Return type:

tvm.tir.PrimExpr

tilelang.language.gemm_op.tcgen05_gemm(A, B, C, transpose_A=False, transpose_B=False, policy=GemmWarpPolicy.Square, clear_accum=False, *, mbar)¶

Explicit Blackwell TCGEN05 GEMM without an implicit wait.

This is the explicit asynchronous Blackwell TCGEN5MMA counterpart to the default synchronous T.gemm(…) interface, with two stricter guarantees: - it always requests the TCGEN5MMA lowering path - it never auto-emits an inlined mbarrier_wait_parity

If the current target or operand pattern cannot use Blackwell TCGEN5MMA, compilation fails instead of silently falling back to another GEMM path.

Parameters:
  • A (tilelang._typing.BufferLikeType)

  • B (tilelang._typing.BufferLikeType)

  • C (tilelang._typing.BufferLikeType)

  • transpose_A (bool)

  • transpose_B (bool)

  • policy (tilelang.tileop.base.GemmWarpPolicy)

  • clear_accum (bool)

  • mbar (tilelang._typing.BarrierType)

Return type:

tvm.tir.PrimExpr