tilelang.transform.pass_config¶

Classes¶

PassConfigKey

Pass configuration keys for TileLang compiler.

Functions¶

normalize_pass_configs(pass_configs)

Canonicalize known pass-config keys and emit compatibility warnings.

Module Contents¶

class tilelang.transform.pass_config.PassConfigKey¶

Bases: str, enum.Enum

Pass configuration keys for TileLang compiler.

TL_SIMPLIFY = 'tl.Simplify'¶

Configuration for TileLang simplification passes.

This is a dict-based config with the following options: - transitively_prove_inequalities: bool, default False - convert_boolean_to_and_of_ors: bool, default False - apply_constraints_to_boolean_branches: bool, default False - propagate_knowns_to_prove_conditional: bool, default False - propagate_knowns_to_simplify_expressions: bool, default False - enable_simplify_let_inline: bool, default True

Usage:

with tvm.transform.PassContext(config={: “tl.Simplify”: {“enable_simplify_let_inline”: False}
}):: mod = tl.transform.Simplify()(mod)

TL_SIMPLIFY_TRANSITIVELY_PROVE_INEQUALITIES = 'transitively_prove_inequalities'¶

False

Type:: Enable transitive inequality proving in simplification. Default

TL_SIMPLIFY_CONVERT_BOOLEAN_TO_AND_OF_ORS = 'convert_boolean_to_and_of_ors'¶

False

Type:: Convert boolean expressions to AND of ORs form. Default

TL_SIMPLIFY_APPLY_CONSTRAINTS_TO_BOOLEAN_BRANCHES = 'apply_constraints_to_boolean_branches'¶

False

Type:: Apply constraints to simplify boolean branches. Default

TL_SIMPLIFY_PROPAGATE_KNOWNS_TO_PROVE_CONDITIONAL = 'propagate_knowns_to_prove_conditional'¶

False

Type:: Propagate known values to prove conditionals. Default

TL_SIMPLIFY_PROPAGATE_KNOWNS_TO_SIMPLIFY_EXPRESSIONS = 'propagate_knowns_to_simplify_expressions'¶

False

Type:: Propagate known values to simplify expressions. Default

TL_SIMPLIFY_ENABLE_LET_INLINE = 'enable_simplify_let_inline'¶

True

Type:: Enable inlining of let statements during simplification. Default

TL_DISABLE_DATA_RACE_CHECK = 'tl.disable_data_race_check'¶

False

Type:: Disable data race check in TileLang. Default

TL_DISABLE_PRELOWER_SEMANTIC_CHECK = 'tl.disable_prelower_semantic_check'¶

False

Type:: Disable Python-side pre-lower semantic checks. Default

TL_DISABLE_WARP_SPECIALIZED = 'tl.disable_warp_specialized'¶

False

Type:: Disable warp specialization optimization. Default

TL_ENABLE_FAST_MATH = 'tl.enable_fast_math'¶

False if enabled, –use_fast_math will be passed to nvcc

Type:: Enable fast math optimization. Default

TL_PTXAS_REGISTER_USAGE_LEVEL = 'tl.ptxas_register_usage_level'¶: The PTXAS register usage level in [0, 10], which controls the aggressiveness of optimizations that affect register usage. Default: None

TL_ENABLE_PTXAS_VERBOSE_OUTPUT = 'tl.enable_ptxas_verbose_output'¶

False

Type:: Enable ptxas verbose output. Default

TL_DEVICE_COMPILE_FLAGS = 'tl.device_compile_flags'¶

Additional device compiler flags passed to nvcc/NVRTC.

Accepts either a string (parsed with shell-like splitting) or a list of strings. Typical usage is to provide extra include paths, defines or ptxas options, e.g.:

“-I/opt/include -DMY_SWITCH=1 –ptxas-options=–verbose”
[“-I/opt/include”, “-DMY_SWITCH=1”, “–ptxas-options=–verbose”]

These flags are appended to the compiler options used in the tvm_ffi CUDA compile callback. Default: None

TL_CONFIG_INDEX_BITWIDTH = 'tl.config_index_bitwidth'¶

Type:: Bitwidth for configuration indices. Default

TL_DISABLE_TMA_LOWER = 'tl.disable_tma_lower'¶

Deprecated flag — prevents plain T.copy() from auto-lowering to TMA store.

Temporarily re-enabled for backward compatibility. Will be removed in v0.1.10.

TL_DISABLE_SAFE_MEMORY_ACCESS = 'tl.disable_safe_memory_legalize'¶

False

Type:: Disable safe memory access optimization. Default

TL_DISABLE_VECTORIZE_256 = 'tl.disable_vectorize_256'¶

False

Type:: Disable usage of LDG/STG 256. Default

TL_ENABLE_ASYNC_COPY = 'tl.enable_async_copy'¶

Enable lowering eligible global->shared copies to PTX cp.async.

When True (default), TileLang may lower: - T.copy(global -> shared, …) to ptx_cp_async + commit + wait - T.async_copy(global -> shared, …) to ptx_cp_async + commit (no wait) - plain user-written global->shared copy stores (e.g. in T.Parallel) to

ptx_cp_async + commit + wait

Important: Automatic cp.async lowering is gated by the surrounding loop context. TileLang will only auto-enable cp.async when the copy is observed inside a software-pipelined loop annotated with num_stages > 0 (e.g. created by T.Pipelined(…, num_stages=…) or by pipeline planning). Outside such loops, TileLang will prefer synchronous copy lowering even when this flag is True. You can request local cp.async injection on a specific parallel loop via T.Parallel(…, prefer_async=True).

When False, TileLang will avoid the cp.async lowering path for T.copy. Explicit T.async_copy still requires cp.async support and may error if it cannot be lowered.

Default: True

TL_ENABLE_LOWER_LDGSTG = 'tl.enable_lower_ldgstg'¶: Enable non-predicated LDG/STG lowering for global memory access. When enabled, converts Ramp-based global buffer load/store to ldg/stg intrinsics. Default: False

TL_ENABLE_LOWER_LDGSTG_PREDICATED = 'tl.enable_lower_ldgstg_predicated'¶: Enable predicated LDG/STG lowering. When True, predicated loads (if_then_else with else=0) and predicated stores (IfThenElse with empty then case) are lowered to ldg/stg intrinsics. Default: False

TL_ENABLE_VECTORIZE_PLANNER_VERBOSE = 'tl.enable_vectorize_planner_verbose'¶: Enable verbose output for vectorize planner. When enabled, prints detailed information about each buffer’s inferred vector size and which buffer determines the final vectorization factor. Useful for debugging vectorization issues. Default: False

TL_DISABLE_WGMMA = 'tl.disable_wgmma'¶

False

Type:: Disable usage of Hopper WGMMA. Default

TL_DEBUG_MERGE_SHARED_MEMORY_ALLOCATIONS = 'tl.debug_merge_shared_memory_allocations'¶

False

Type:: Enable debug information for merge shared memory allocations. Default

TL_ENABLE_AGGRESSIVE_SHARED_MEMORY_MERGE = 'tl.enable_aggressive_shared_memory_merge'¶

False

Type:: Enable aggressive merge of shared memory allocations. Default

TL_DISABLE_SHUFFLE_ELECT = 'tl.disable_shuffle_elect'¶

False

Type:: Disable shuffle election optimization. Default

TL_DISABLE_LOOP_UNSWITCHING = 'tl.disable_loop_unswitching'¶

False

Type:: Disable loop unswitching optimization. Default

TL_LOOP_UNSWITCHING_ALLOW_NON_TRIVIAL_ELSE = 'tl.loop_unswitching_allow_non_trivial_else'¶

Allow loop unswitching even when the else-version of the loop body has side effects.

This is more aggressive and may increase code size. Default: False.

TL_DISABLE_THREAD_STORAGE_SYNC = 'tl.disable_thread_storage_sync'¶: Disable thread storage synchronization pass. When enabled, disables the automatic insertion of thread synchronization barriers (e.g., __syncthreads()) for shared memory access coordination. This can be useful for performance optimization in cases where manual synchronization is preferred or when synchronization is not needed. Default: False

TL_FORCE_LET_INLINE = 'tl.force_let_inline'¶

False

Type:: Force TileLang to inline let bindings during simplification. Default

TL_AST_PRINT_ENABLE = 'tl.ast_print_enable'¶

False

Type:: Enable TIR AST printing for debugging purposes. Default

TL_LAYOUT_VISUALIZATION_ENABLE = 'tl.layout_visualization_enable'¶

False

Type:: Enable layout inference visualization. Default

TL_LAYOUT_VISUALIZATION_FORMATS = 'tl.layout_visualization_formats'¶: Layout visualization formats. Acceptable values: “pdf”, “png”, “svg”, “all”

TL_STORAGE_REWRITE_DETECT_INPLACE = 'tl.storage_rewrite_detect_inplace'¶

Control StorageRewrite inplace detection.

When False (default) StorageRewrite keeps distinct temporaries for patterns such as dst[i] = f(src[i]), avoiding implicit aliasing:

` read = T.allocate([1], T.int32, "local.var") write = T.allocate([1], T.int32, "local.var") read_buf = T.Buffer((1,), T.int32, data=read, scope="local.var") write_buf = T.Buffer((1,), T.int32, data=write, scope="local.var") write_buf[0] = read_buf[0] * 2 f(write_buf[0]) `

Setting the flag to True allows StorageRewrite to reuse the read buffer for the write when it can prove the update is safely inplace, producing IR like:

` read = T.allocate([1], T.int32, "local.var") read_buf = T.Buffer((1,), T.int32, data=read, scope="local.var") read_buf[0] = read_buf[0] * 2 f(read_buf[0]) `

This reduces local memory usage but introduces aliasing between the buffers.

Usage:

```python from tilelang.transform import PassContext, PassConfigKey

with PassContext(: config={PassConfigKey.TL_STORAGE_REWRITE_DETECT_INPLACE.value: True}
):: mod = tilelang.transform.StorageRewrite()(mod)

```

TIR_ENABLE_EQUIV_TERMS_IN_CSE = 'tir.enable_equiv_terms_in_cse_tir'¶

True

Type:: Enable equivalent terms in TIR Common Subexpression Elimination. Default

TIR_DISABLE_CSE = 'tir.disable_cse_tir'¶

False

Type:: Disable TIR Common Subexpression Elimination. Default

TIR_SIMPLIFY = 'tir.Simplify'¶

True

Type:: Enable/disable TIR simplification passes. Default

TIR_DISABLE_STORAGE_REWRITE = 'tir.disable_storage_rewrite'¶

False

Type:: Disable storage rewrite optimization. Default

TIR_DISABLE_VECTORIZE = 'tir.disable_vectorize'¶

False

Type:: Disable vectorization optimization. Default

TIR_USE_ASYNC_COPY = 'tir.use_async_copy'¶

True

Type:: Enable asynchronous memory copy operations. Default

TIR_ENABLE_DEBUG = 'tir.enable_debug'¶

False

Type:: Enable debug information in generated code. Default

TIR_MERGE_STATIC_SMEM = 'tir.merge_static_smem'¶

True

Type:: Merge static shared memory allocations. Default

TIR_ADD_LOWER_PASS = 'tir.add_lower_pass'¶

None

Type:: Additional lowering passes to be applied. Default

TIR_NOALIAS = 'tir.noalias'¶

True

Type:: Enable pointer non-aliasing assumptions. Default

CUDA_KERNELS_OUTPUT_DIR = 'cuda.kernels_output_dir'¶

empty string

Type:: Output directory for generated CUDA kernels. Default

TL_DISABLE_OUT_OF_BOUND_WARNING = 'tl.disable_out_of_bound_warning'¶

True

Type:: Disable out-of-bound access warnings in safe memory access legalization. Default

TL_ENABLE_DUMP_IR = 'tl.enable_dump_ir'¶

False

Type:: Enable dumping IR during lowering between passes. Default

TL_DUMP_IR_DIR = 'tl.dump_ir_path'¶

./dump_ir/

Type:: Path to the directory where IR will be dumped. Default

tilelang.transform.pass_config.normalize_pass_configs(pass_configs)¶

Canonicalize known pass-config keys and emit compatibility warnings.

Parameters:: pass_configs (dict[str, Any] | None)
Return type:: dict[str, Any]