tilelang.cuda.op.gemm.gemm_tcgen05
==================================

.. py:module:: tilelang.cuda.op.gemm.gemm_tcgen05


Attributes
----------

.. autoapisummary::

   tilelang.cuda.op.gemm.gemm_tcgen05.GEMM_INST_TCGEN05


Classes
-------

.. autoapisummary::

   tilelang.cuda.op.gemm.gemm_tcgen05.GemmTCGEN5


Module Contents
---------------

.. py:data:: GEMM_INST_TCGEN05
   :value: 'cuda.tcgen05'


.. py:class:: GemmTCGEN5

   Bases: :py:obj:`tilelang.tileop.gemm.gemm_base.GemmBase`


   GEMM operator for Blackwell (SM100) TCGEN5MMA instructions.

   Supports the SS (Shared-Shared) and TS (TensorMemory-Shared) variants,
   as well as block-scaled MXFP8 GEMM when SFA/SFB scale factors are present.
   Layout inference and lowering are dispatched based on the memory scopes
   of operands A and B.


   .. py:method:: infer_shared_layout(continuity)

      Infer a standard shared-memory swizzle layout for TCGEN05 operands.


   .. py:method:: infer_layout(target, thread_nums)

      Infer swizzled layouts for operands and accumulator.

      For SS: both A and B get swizzled shared-memory layouts.
      For TS: A and C get TMEM store layouts, B gets a swizzled shared-memory layout.
      For block-scaled: same as SS (A and B get swizzle, C gets TMEM store layout).


   .. py:method:: lower(layout_map, target, thread_bounds, thread_var, mbar_phase_expr = None)

      Lower the GEMM tile-op into a TIR prim_func containing TCGEN5MMA calls.