tilelang.profiler.bench module#

The profiler and convert to torch utils

tilelang.profiler.bench.do_bench(fn: Callable, warmup: float = 25, rep: float = 100, _n_warmup: int = 0, _n_repeat: int = 0, grad_to_none: Optional[List[torch.Tensor]] = None, quantiles: Optional[List[float]] = None, fast_flush: bool = True, return_mode: Literal['min', 'max', 'mean', 'median'] = 'mean') Union[float, List[float]]#

Benchmarks the runtime of a PyTorch function.

This function handles: - L2 cache flushing between runs for consistent timing - Automatic warmup and repeat count calculation - Optional gradient clearing for backward passes - Multiple measurement modes (mean, median, min, max)

Parameters:
  • fn – Function to benchmark

  • warmup – Target warmup time in milliseconds

  • rep – Target number of repetitions

  • _n_warmup – Override for number of warmup iterations

  • _n_repeat – Override for number of timing iterations

  • grad_to_none – Tensors whose gradients should be cleared between runs

  • quantiles – Optional performance percentiles to compute

  • fast_flush – Whether to use faster L2 cache flushing

  • return_mode – How to aggregate timing results (“mean”, “median”, “min”, “max”)

Returns:

Aggregated runtime in milliseconds

Return type:

float