tilelang.autotuner.tuner¶

The auto-tune module for tilelang programs.

This module provides functionality for auto-tuning tilelang programs, including JIT compilation and performance optimization through configuration search.

Attributes¶

logger

Exceptions¶

TimeoutException

Common base class for all non-exit exceptions.

Classes¶

`AutoTuner`	Auto-tuner for tilelang programs.
`AutoTuneImpl`	Abstract base class for generic types.

Functions¶

`timeout_handler`(signum, frame)
`run_with_timeout`(func, timeout, args, *kwargs)
`get_available_cpu_count`()	Gets the number of CPU cores available to the current process.
`autotune`([func, warmup, rep, timeout, supply_type, ...])	Just-In-Time (JIT) compiler decorator for TileLang functions.

Module Contents¶

exception tilelang.autotuner.tuner.TimeoutException¶

Bases: Exception

Common base class for all non-exit exceptions.

tilelang.autotuner.tuner.timeout_handler(signum, frame)¶

tilelang.autotuner.tuner.run_with_timeout(func, timeout, *args, **kwargs)¶

tilelang.autotuner.tuner.logger¶

tilelang.autotuner.tuner.get_available_cpu_count()¶

Gets the number of CPU cores available to the current process.

Return type:: int

class tilelang.autotuner.tuner.AutoTuner(fn, configs)¶

Auto-tuner for tilelang programs.

This class handles the auto-tuning process by testing different configurations and finding the optimal parameters for program execution.

Parameters:

fn (Callable) – The function to be auto-tuned.
configs – List of configurations to try during auto-tuning.

compile_args¶

profile_args¶

cache_dir: pathlib.Path¶

fn¶

configs¶

ref_latency_cache = None¶

jit_input_tensors = None¶

ref_input_tensors = None¶

jit_compile = None¶

classmethod from_kernel(kernel, configs)¶

Create an AutoTuner instance from a kernel function.

Parameters:

kernel (Callable) – The kernel function to auto-tune.
configs – List of configurations to try.

Returns:

A new AutoTuner instance.

Return type:

AutoTuner

set_compile_args(out_idx=None, target=None, execution_backend=None, target_host=None, verbose=None, pass_configs=None)¶

Set compilation arguments for the auto-tuner.

Parameters:

out_idx (list[int] | int | None) – List of output tensor indices.
target (Literal['auto', 'cuda', 'hip', 'metal'] | None) – Target platform. If None, reads from TILELANG_TARGET environment variable (defaults to “auto”).
execution_backend (Literal['auto', 'tvm_ffi', 'cython', 'nvrtc', 'torch'] | None) – Execution backend to use for kernel execution. If None, reads from TILELANG_EXECUTION_BACKEND environment variable (defaults to “auto”).
target_host (str | tvm.target.Target | None) – Target host for cross-compilation.
verbose (bool | None) – Whether to enable verbose output. If None, reads from TILELANG_VERBOSE environment variable (defaults to False).
pass_configs (dict[str, Any] | None) – Additional keyword arguments to pass to the Compiler PassContext.

Environment Variables:: TILELANG_TARGET: Default compilation target (e.g., “cuda”, “llvm”). Defaults to “auto”. TILELANG_EXECUTION_BACKEND: Default execution backend. Defaults to “auto”. TILELANG_VERBOSE: Set to “1”, “true”, “yes”, or “on” to enable verbose compilation by default.

Returns:

Self for method chaining.

Return type:

AutoTuner

Parameters:

out_idx (list[int] | int | None)
target (Literal['auto', 'cuda', 'hip', 'metal'] | None)
execution_backend (Literal['auto', 'tvm_ffi', 'cython', 'nvrtc', 'torch'] | None)
target_host (str | tvm.target.Target | None)
verbose (bool | None)
pass_configs (dict[str, Any] | None)

set_profile_args(warmup=25, rep=100, timeout=30, supply_type=tilelang.TensorSupplyType.Auto, ref_prog=None, supply_prog=None, rtol=0.01, atol=0.01, max_mismatched_ratio=0.01, skip_check=False, manual_check_prog=None, cache_input_tensors=False)¶

Set profiling arguments for the auto-tuner.

Parameters:

supply_type (tilelang.TensorSupplyType) – Type of tensor supply mechanism. Ignored if supply_prog is provided.
ref_prog (Callable) – Reference program for validation.
supply_prog (Callable) – Supply program for input tensors.
rtol (float) – Relative tolerance for validation.
atol (float) – Absolute tolerance for validation.
max_mismatched_ratio (float) – Maximum allowed mismatch ratio.
skip_check (bool) – Whether to skip validation.
manual_check_prog (Callable) – Manual check program for validation.
cache_input_tensors (bool) – Whether to cache input tensors.
warmup (int) – Number of warmup iterations.
rep (int) – Number of repetitions for timing.
timeout (int) – Maximum time per configuration.

Returns:

Self for method chaining.

Return type:

AutoTuner

set_kernel_parameters(k_parameters, f_parameters)¶

Parameters:

k_parameters (tuple[str, Ellipsis])
f_parameters (dict[str, Any])

generate_cache_key(parameters, extra_parameters)¶

Generate a cache key for the auto-tuning process.

Parameters:

parameters (dict[str, Any])
extra_parameters (dict[str, Any])

Return type:

tilelang.autotuner.param.AutotuneResult | None

run(warmup=25, rep=100, timeout=30)¶

Run the auto-tuning process.

Parameters:

warmup (int) – Number of warmup iterations.
rep (int) – Number of repetitions for timing.
timeout (int) – Maximum time per configuration.

Returns:

Results of the auto-tuning process.

Return type:

AutotuneResult

__call__()¶

Make the AutoTuner callable, running the auto-tuning process.

Returns:: Results of the auto-tuning process.
Return type:: AutotuneResult

class tilelang.autotuner.tuner.AutoTuneImpl¶

Bases: Generic[_P, _T]

Abstract base class for generic types.

A generic type is typically declared by inheriting from this class parameterized with one or more type variables. For example, a generic mapping type might be defined as:

class Mapping(Generic[KT, VT]):
    def __getitem__(self, key: KT) -> VT:
        ...
    # Etc.

This class can then be used as follows:

def lookup_name(mapping: Mapping[KT, VT], key: KT, default: VT) -> VT:
    try:
        return mapping[key]
    except KeyError:
        return default

jit_impl: tilelang.jit.JITImpl¶

warmup: int = 25¶

rep: int = 100¶

timeout: int = 100¶

configs: dict | Callable = None¶

supply_type: tilelang.TensorSupplyType¶

ref_prog: Callable = None¶

supply_prog: Callable = None¶

rtol: float = 0.01¶

atol: float = 0.01¶

max_mismatched_ratio: float = 0.01¶

skip_check: bool = False¶

manual_check_prog: Callable = None¶

cache_input_tensors: bool = False¶

__post_init__()¶

get_tunner()¶

__call__(*args, **kwargs)¶

Parameters:

args (_P)
kwargs (_P)

Return type:

tilelang.jit.kernel.JITKernel

tilelang.autotuner.tuner.autotune(func=None, *, configs, warmup=25, rep=100, timeout=100, supply_type=tilelang.TensorSupplyType.Auto, ref_prog=None, supply_prog=None, rtol=0.01, atol=0.01, max_mismatched_ratio=0.01, skip_check=False, manual_check_prog=None, cache_input_tensors=False)¶

Just-In-Time (JIT) compiler decorator for TileLang functions.

This decorator can be used without arguments (e.g., @tilelang.jit):

Applies JIT compilation with default settings.

Tips:

If you want to skip the auto-tuning process, you can set override the tunable parameters in the function signature.

```python

if enable_autotune:
kernel = flashattn(batch, heads, seq_len, dim, is_causal)

else:

kernel = flashattn(
batch, heads, seq_len, dim, is_causal, groups=groups, block_M=128, block_N=128, num_stages=2, threads=256)

```

Parameters:

func_or_out_idx (Any, optional) – If using @tilelang.jit(…) to configure, this is the out_idx parameter. If using @tilelang.jit directly on a function, this argument is implicitly the function to be decorated (and out_idx will be None).
configs (Dict or Callable) – Configuration space to explore during auto-tuning.
warmup (int, optional) – Number of warmup iterations before timing.
rep (int, optional) – Number of repetitions for timing measurements.
timeout (int, optional)
target (Union[str, Target], optional) – Compilation target for TVM (e.g., “cuda”, “llvm”). Defaults to “auto”.
target_host (Union[str, Target], optional) – Target host for cross-compilation. Defaults to None.
execution_backend (Literal["auto", "tvm_ffi", "cython", "nvrtc", "torch"], optional) – Backend for kernel execution and argument passing. Use “auto” to pick a sensible default per target (cuda->tvm_ffi, metal->torch, others->cython).
verbose (bool, optional) – Enables verbose logging during compilation. Defaults to False.
pass_configs (Optional[Dict[str, Any]], optional) – Configurations for TVM’s pass context. Defaults to None.
debug_root_path (Optional[str], optional) – Directory to save compiled kernel source for debugging. Defaults to None.
func (Callable[_P, _T] | tvm.tir.PrimFunc | None)
supply_type (tilelang.TensorSupplyType)
ref_prog (Callable)
supply_prog (Callable)
rtol (float)
atol (float)
max_mismatched_ratio (float)
skip_check (bool)
manual_check_prog (Callable)
cache_input_tensors (bool)

Returns:

Either a JIT-compiled wrapper around the input function, or a configured decorator instance that can then be applied to a function.

Return type:

Callable