tilelang.autotuner package#
Module contents#
The auto-tune module for tilelang programs.
This module provides functionality for auto-tuning tilelang programs, including JIT compilation and performance optimization through configuration search.
- class tilelang.autotuner.AutoTuner(fn: Callable, configs)#
Bases:
object
Auto-tuner for tilelang programs.
This class handles the auto-tuning process by testing different configurations and finding the optimal parameters for program execution.
- Parameters:
fn – The function to be auto-tuned.
configs – List of configurations to try during auto-tuning.
- classmethod from_kernel(kernel: Callable, configs)#
Create an AutoTuner instance from a kernel function.
- Parameters:
kernel – The kernel function to auto-tune.
configs – List of configurations to try.
- Returns:
A new AutoTuner instance.
- Return type:
- run(warmup: int = 25, rep: int = 100, timeout: int = 100)#
Run the auto-tuning process.
- Parameters:
warmup – Number of warmup iterations.
rep – Number of repetitions for timing.
timeout – Maximum time per configuration.
- Returns:
Results of the auto-tuning process.
- Return type:
- set_compile_args(out_idx: List[int], supply_type: TensorSupplyType = TensorSupplyType.Auto, ref_prog: Optional[Callable] = None, supply_prog: Optional[Callable] = None, rtol: float = 0.01, atol: float = 0.01, max_mismatched_ratio: float = 0.01, skip_check: bool = False, cache_input_tensors: bool = True, target: Literal['auto', 'cuda', 'hip'] = 'auto')#
Set compilation arguments for the auto-tuner.
- Parameters:
out_idx – List of output tensor indices.
supply_type – Type of tensor supply mechanism. Ignored if supply_prog is provided.
ref_prog – Reference program for validation.
supply_prog – Supply program for input tensors.
rtol – Relative tolerance for validation.
atol – Absolute tolerance for validation.
max_mismatched_ratio – Maximum allowed mismatch ratio.
skip_check – Whether to skip validation.
cache_input_tensors – Whether to cache input tensors.
target – Target platform.
- Returns:
Self for method chaining.
- Return type:
- class tilelang.autotuner.AutotuneResult(latency: float, config: dict, ref_latency: float, libcode: str, func: Callable, kernel: Callable)#
Bases:
object
Results from auto-tuning process.
- latency#
Best achieved execution latency.
- Type:
float
- config#
Configuration that produced the best result.
- Type:
dict
- ref_latency#
Reference implementation latency.
- Type:
float
- libcode#
Generated library code.
- Type:
str
- func#
Optimized function.
- Type:
Callable
- kernel#
Compiled kernel function.
- Type:
Callable
- config: dict#
- func: Callable#
- kernel: Callable#
- latency: float#
- libcode: str#
- ref_latency: float#
- class tilelang.autotuner.JITContext(out_idx: List[int], ref_prog: Callable, supply_prog: Callable, rtol: float, atol: float, max_mismatched_ratio: float, skip_check: bool, cache_input_tensors: bool, profiler: Profiler, target: Literal['cuda', 'hip'])#
Bases:
object
Context object for Just-In-Time compilation settings.
- out_idx#
List of output tensor indices.
- Type:
List[int]
- ref_prog#
Reference program for correctness validation.
- Type:
Callable
- supply_prog#
Supply program for input tensors.
- Type:
Callable
- rtol#
Relative tolerance for output validation.
- Type:
float
- atol#
Absolute tolerance for output validation.
- Type:
float
- max_mismatched_ratio#
Maximum allowed ratio of mismatched elements.
- Type:
float
- skip_check#
Whether to skip validation checks.
- Type:
bool
- cache_input_tensors#
Whether to cache input tensors for each compilation.
- Type:
bool
- profiler#
Profiler instance for performance measurement.
- target#
Target platform (‘cuda’ or ‘hip’).
- Type:
Literal[‘cuda’, ‘hip’]
- atol: float#
- cache_input_tensors: bool#
- max_mismatched_ratio: float#
- out_idx: List[int]#
- ref_prog: Callable#
- rtol: float#
- skip_check: bool#
- supply_prog: Callable#
- target: Literal['cuda', 'hip']#
- tilelang.autotuner.autotune(configs: Any, warmup: int = 25, rep: int = 100, timeout: int = 100) AutotuneResult #
Decorator for auto-tuning tilelang programs.
- Parameters:
configs – Configuration space to explore during auto-tuning.
warmup – Number of warmup iterations before timing.
rep – Number of repetitions for timing measurements.
timeout – Maximum time (in seconds) allowed for each configuration.
- Returns:
Decorated function that performs auto-tuning.
- Return type:
Callable
- tilelang.autotuner.check_tensor_list_compatibility(list1: List[torch.Tensor], list2: List[torch.Tensor]) bool #
Checks if two lists of tensors are compatible.
Compatibility checks performed include: 1. Lists have the same length. 2. Corresponding tensors have the same shape.
- Parameters:
list1 – First list of tensors.
list2 – Second list of tensors.
- tilelang.autotuner.jit(out_idx: Optional[List[int]] = None, supply_type: TensorSupplyType = TensorSupplyType.Auto, ref_prog: Optional[Callable] = None, supply_prog: Optional[Callable] = None, rtol: float = 0.01, atol: float = 0.01, max_mismatched_ratio: float = 0.01, skip_check: bool = False, cache_input_tensors: bool = True, target: Literal['auto', 'cuda', 'hip'] = 'auto') Callable #
Just-In-Time compilation decorator for tilelang programs.
- Parameters:
out_idx – List of output tensor indices.
supply_type – Type of tensor supply mechanism. Ignored if supply_prog is provided.
ref_prog – Reference program for correctness validation.
supply_prog – Supply program for input tensors.
rtol – Relative tolerance for output validation.
atol – Absolute tolerance for output validation.
max_mismatched_ratio – Maximum allowed ratio of mismatched elements.
skip_check – Whether to skip validation checks.
cache_input_tensors – Whether to cache input tensors for each compilation.
target – Target platform (‘auto’, ‘cuda’, or ‘hip’).
- Returns:
Decorated function that performs JIT compilation.
- Return type:
Callable