tilelang.transform¶

Wrapping transformations.

Submodules¶

Functions¶

get_pass_context()

Get the current pass context

ClusterPlanning()

ClusterPlanning

PipelinePlanning()

infer the fragment/shared memory layout

LayoutInference()

LayoutInference

LowerTileOp()

LowerTileOp

InjectSoftwarePipeline()

InjectSoftwarePipeline

InjectAssumes()

Inject Assumes

LowerHopperIntrin()

LowerHopperIntrin

WarpSpecializedPipeline()

WarpSpecializedPipeline

RewriteWgmmaSync()

RewriteWgmmaSync

ThreadSync(storage_scope)

Insert sync between parallel read/write of shared buffers.

ThreadPartialSync(storage_scope)

Insert partial sync.

IfStmtBinding()

IfStmtBinding

MergeIfStmt()

MergeIfStmt

MultiVersionBuffer()

WarpSpecializedPipeline

WarpSpecialized()

WarpSpecializedPipeline

AnnotateWarpGroupRegAlloc()

Inject set_max_nreg calls into warp-specialized functions.

InjectTmaBarrier()

InjectTmaBarrier

InjectFenceProxy()

InjectFenceProxy

LegalizeVectorizedLoop()

LegalizeLoopVectorize

LegalizeSafeMemoryAccess()

LegalizeLoopVectorize

MakePackedAPI()

MakePackedAPI

AnnotateDeviceRegions()

AnnotateDeviceRegions

VectorizeLoop([enable_vectorize])

VectorizeLoop

InjectPTXAsyncCopy()

Rewrite global to shared memory copy on CUDA with asynchronous copy.

LowerDeviceStorageAccessInfo()

Lower attached storage access information on device.

LoopVectorizeDynamic()

Try to vectorize loop with dynamic shape.

ConfigIndexBitwidth()

Config index bitwidth.

FlattenBuffer()

FlattenBuffer

EliminateStorageSyncForMBarrier()

EliminateStorageSyncForMBarrier

MergeSharedMemoryAllocations([...])

MergeSharedMemoryAllocations

LowerL2Persistent()

LowerL2Persistent

PersistThreadblock()

PersistThreadblock

AlignDynamicSharedMemoryAllocations([align_bytes])

AlignDynamicSharedMemoryAllocations

LowerSharedBarrier()

LowerSharedBarrier

StorageRewrite()

StorageRewrite

LowerOpaqueBlock()

LowerOpaqueBlock

LowerThreadAllreduce()

LowerThreadAllreduce

LowerDeviceKernelLaunch()

Create and return a transform pass that lowers device kernel launch constructs to target-specific IR.

LayoutReducer()

Return a TVM transform pass that performs layout reduction/normalization.

Package Contents¶

tilelang.transform.get_pass_context()¶

Get the current pass context

tilelang.transform.ClusterPlanning()¶

ClusterPlanning

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.PipelinePlanning()¶

infer the fragment/shared memory layout

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.LayoutInference()¶

LayoutInference

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.LowerTileOp()¶

LowerTileOp

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.InjectSoftwarePipeline()¶

InjectSoftwarePipeline

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.InjectAssumes()¶

Inject Assumes

Returns:¶

fpasstvm.transform.Pass

The result pass

tilelang.transform.LowerHopperIntrin()¶

LowerHopperIntrin

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.WarpSpecializedPipeline()¶

WarpSpecializedPipeline

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.RewriteWgmmaSync()¶

RewriteWgmmaSync

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.ThreadSync(storage_scope)¶

Insert sync between parallel read/write of shared buffers.

Parameters:

storage_scope (str) – The target storage scope.

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.ThreadPartialSync(storage_scope)¶

Insert partial sync.

Parameters:

storage_scope (str) – The target storage scope.

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.IfStmtBinding()¶

IfStmtBinding

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.MergeIfStmt()¶

MergeIfStmt

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.MultiVersionBuffer()¶

WarpSpecializedPipeline

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.WarpSpecialized()¶

WarpSpecializedPipeline

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.AnnotateWarpGroupRegAlloc()¶

Inject set_max_nreg calls into warp-specialized functions.

This pass analyzes the function to collect register hints from set_max_nreg and no_set_max_nreg calls, then injects appropriate set_max_nreg calls into producer and consumer branches of warp-specialized code.

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.InjectTmaBarrier()¶

InjectTmaBarrier

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.InjectFenceProxy()¶

InjectFenceProxy

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.LegalizeVectorizedLoop()¶

LegalizeLoopVectorize

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.LegalizeSafeMemoryAccess()¶

LegalizeLoopVectorize

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.MakePackedAPI()¶

MakePackedAPI

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.AnnotateDeviceRegions()¶

AnnotateDeviceRegions

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.VectorizeLoop(enable_vectorize=True)¶

VectorizeLoop

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

Parameters:

enable_vectorize (bool)

tilelang.transform.InjectPTXAsyncCopy()¶

Rewrite global to shared memory copy on CUDA with asynchronous copy.

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.LowerDeviceStorageAccessInfo()¶

Lower attached storage access information on device.

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

Note

Run this pass after all storage access analysis finish.

tilelang.transform.LoopVectorizeDynamic()¶

Try to vectorize loop with dynamic shape.

Returns:

  • fpass (tvm.transform.Pass) – The result pass

  • —-

tilelang.transform.ConfigIndexBitwidth()¶

Config index bitwidth.

Returns:

  • fpass (tvm.transform.Pass) – The result pass

  • —-

tilelang.transform.FlattenBuffer()¶

FlattenBuffer

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.EliminateStorageSyncForMBarrier()¶

EliminateStorageSyncForMBarrier

tilelang.transform.MergeSharedMemoryAllocations(enable_aggressive_merge=False, align_bytes=16)¶

MergeSharedMemoryAllocations

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

Parameters:
  • enable_aggressive_merge (bool)

  • align_bytes (int)

tilelang.transform.LowerL2Persistent()¶

LowerL2Persistent

tilelang.transform.PersistThreadblock()¶

PersistThreadblock

tilelang.transform.AlignDynamicSharedMemoryAllocations(align_bytes=16)¶

AlignDynamicSharedMemoryAllocations

Parameters:

align_bytes (int) – The alignment bytes.

tilelang.transform.LowerSharedBarrier()¶

LowerSharedBarrier

tilelang.transform.StorageRewrite()¶

StorageRewrite

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.LowerOpaqueBlock()¶

LowerOpaqueBlock

tilelang.transform.LowerThreadAllreduce()¶

LowerThreadAllreduce

tilelang.transform.LowerDeviceKernelLaunch()¶

Create and return a transform pass that lowers device kernel launch constructs to target-specific IR.

This pass transforms high-level device kernel launch and related intrinsics into lower-level IR suitable for backend code generation and device-side lowering.

Returns:

The transform pass that performs device kernel launch lowering.

Return type:

tvm.transform.Pass

tilelang.transform.LayoutReducer()¶

Return a TVM transform pass that performs layout reduction/normalization.

This wrapper delegates to the underlying FFI implementation and returns a pass object suitable for use in a PassContext or pass pipeline. The pass is intended to simplify or reduce tensor/layout-related representations during relay/tile transformations.

Returns:

The transform pass object produced by the FFI backend.