tilelang.transform¶

Wrapping transformations.

Submodules¶

Functions¶

get_pass_context()

Get the current pass context

ClusterPlanning()

ClusterPlanning

PipelinePlanning()

infer the fragment/shared memory layout

LayoutInference()

LayoutInference

LowerTileOp()

LowerTileOp

InjectSoftwarePipeline()

InjectSoftwarePipeline

LegalizeNegativeIndex()

Legalize negative indices in buffer loads.

InjectAssumes()

Inject Assumes for natural shape boundary conditions. And convert Assumes in Evaluate(Call(...)) form

VerifyParallelLoop()

VerifyParallelLoop

ThreadSync(storage_scope)

Insert sync between parallel read/write of shared buffers.

IfStmtBinding()

IfStmtBinding

MergeIfStmt()

MergeIfStmt

LoopUnswitching()

LoopUnswitching: Hoist loop-invariant if statements out of loops.

LegalizeVectorizedLoop()

LegalizeLoopVectorize

LegalizeSafeMemoryAccess()

LegalizeLoopVectorize

LowerAccessPtr()

Lower TileLang frontend tl.access_ptr to tir.builtin.tvm_access_ptr.

MakePackedAPI()

MakePackedAPI

MaterializeKernelLaunch([lower_thread_binding])

Materialize the target-neutral kernel launch nest (thread_binding

AnnotateDeviceRegions()

AnnotateDeviceRegions

SplitHostDevice()

Split host/device functions even for empty kernels.

AnnotateReadOnlyParams()

Annotate read-only handle parameters for PrimFuncs.

VectorizeLoop([enable_vectorize])

VectorizeLoop

ConfigIndexBitwidth()

Config index bitwidth.

FlattenBuffer()

FlattenBuffer

MergeSharedMemoryAllocations([...])

MergeSharedMemoryAllocations

PlanAndUpdateBufferAllocationLocation()

Plan and update buffer allocation locations within PrimFuncs.

HoistGlobalBufferAllocations()

Hoist global buffer allocations to the top of the block (host side).

HoistNonRestrictParams()

StorageRewrite()

StorageRewrite

LowerOpaqueBlock()

LowerOpaqueBlock

LowerThreadAllreduce()

LowerThreadAllreduce

LowerIntrin()

LowerIntrin

LowerDeviceKernelLaunch()

Create and return a transform pass that lowers device kernel launch constructs to target-specific IR.

LayoutReducer()

Return a TVM transform pass that performs layout reduction/normalization.

UnrollLoop()

Unroll loops as in Halide pipeline.

Package Contents¶

tilelang.transform.get_pass_context()¶

Get the current pass context

tilelang.transform.ClusterPlanning()¶

ClusterPlanning

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.PipelinePlanning()¶

infer the fragment/shared memory layout

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.LayoutInference()¶

LayoutInference

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.LowerTileOp()¶

LowerTileOp

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.InjectSoftwarePipeline()¶

InjectSoftwarePipeline

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.LegalizeNegativeIndex()¶

Legalize negative indices in buffer loads.

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.InjectAssumes()¶

Inject Assumes for natural shape boundary conditions. And convert Assumes in Evaluate(Call(…)) form (tvm builtin assume call) to AttrNode form.

Returns:¶

fpasstvm.transform.Pass

The result pass

tilelang.transform.VerifyParallelLoop()¶

VerifyParallelLoop

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.ThreadSync(storage_scope)¶

Insert sync between parallel read/write of shared buffers.

Parameters:

storage_scope (str) – The target storage scope.

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.IfStmtBinding()¶

IfStmtBinding

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.MergeIfStmt()¶

MergeIfStmt

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.LoopUnswitching()¶

LoopUnswitching: Hoist loop-invariant if statements out of loops.

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.LegalizeVectorizedLoop()¶

LegalizeLoopVectorize

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.LegalizeSafeMemoryAccess()¶

LegalizeLoopVectorize

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.LowerAccessPtr()¶

Lower TileLang frontend tl.access_ptr to tir.builtin.tvm_access_ptr.

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.MakePackedAPI()¶

MakePackedAPI

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.MaterializeKernelLaunch(lower_thread_binding=True)¶

Materialize the target-neutral kernel launch nest (thread_binding For loops emitted by T.Kernel) into a backend-specific form. Each backend pipeline decides the mode for itself:

Parameters:

lower_thread_binding (bool) – If True (SIMT backends, e.g. CUDA/ROCm/Metal), lower the blockIdx.*/threadIdx.* loops into thread_extent AttrStmts. If False (backends without SIMT, e.g. CPU), lower blockIdx.* loops into plain serial For loops and ignore threadIdx.* loops (their extents are dropped; the loop vars are pinned to 0).

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.AnnotateDeviceRegions()¶

AnnotateDeviceRegions

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.SplitHostDevice()¶

Split host/device functions even for empty kernels.

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.AnnotateReadOnlyParams()¶

Annotate read-only handle parameters for PrimFuncs.

Adds attribute tl.readonly_param_indices listing param indices that are never written, enabling CUDA codegen to emit const qualifiers to unlock read-only cache loads.

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.VectorizeLoop(enable_vectorize=True)¶

VectorizeLoop

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

Parameters:

enable_vectorize (bool)

tilelang.transform.ConfigIndexBitwidth()¶

Config index bitwidth.

Returns:

  • fpass (tvm.transform.Pass) – The result pass

  • —-

tilelang.transform.FlattenBuffer()¶

FlattenBuffer

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.MergeSharedMemoryAllocations(enable_aggressive_merge=False, align_bytes=16, disable_reuse=False)¶

MergeSharedMemoryAllocations

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

Parameters:
  • enable_aggressive_merge (bool)

  • align_bytes (int)

  • disable_reuse (bool)

tilelang.transform.PlanAndUpdateBufferAllocationLocation()¶

Plan and update buffer allocation locations within PrimFuncs.

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.HoistGlobalBufferAllocations()¶

Hoist global buffer allocations to the top of the block (host side).

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.HoistNonRestrictParams()¶
tilelang.transform.StorageRewrite()¶

StorageRewrite

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass

tilelang.transform.LowerOpaqueBlock()¶

LowerOpaqueBlock

tilelang.transform.LowerThreadAllreduce()¶

LowerThreadAllreduce

tilelang.transform.LowerIntrin()¶

LowerIntrin

tilelang.transform.LowerDeviceKernelLaunch()¶

Create and return a transform pass that lowers device kernel launch constructs to target-specific IR.

This pass transforms high-level device kernel launch and related intrinsics into lower-level IR suitable for backend code generation and device-side lowering.

Returns:

The transform pass that performs device kernel launch lowering.

Return type:

tvm.transform.Pass

tilelang.transform.LayoutReducer()¶

Return a TVM transform pass that performs layout reduction/normalization.

This wrapper delegates to the underlying FFI implementation and returns a pass object suitable for use in a PassContext or pass pipeline. The pass is intended to simplify or reduce tensor/layout-related representations during relay/tile transformations.

Returns:

The transform pass object produced by the FFI backend.

tilelang.transform.UnrollLoop()¶

Unroll loops as in Halide pipeline.

This pass unrolls loops based on configuration options including: - auto_max_step: Threshold of number of steps to be automatically unrolled - auto_max_depth: Maximum nested level of loops that can be automatically unrolled - auto_max_extent: Maximum extent of loop that will be unrolled - explicit_unroll: Whether to explicitly unroll instead of setting a pragma - unroll_local_access: Whether to always unroll local access

Returns:

fpass – The result pass

Return type:

tvm.transform.Pass