tilelang.profiler.bench ======================= .. py:module:: tilelang.profiler.bench .. autoapi-nested-parse:: The profiler and convert to torch utils Functions --------- .. autoapisummary:: tilelang.profiler.bench.do_bench Module Contents --------------- .. py:function:: do_bench(fn, warmup = 25, rep = 100, _n_warmup = 0, _n_repeat = 0, grad_to_none = None, quantiles = None, fast_flush = True, return_mode = 'mean') Benchmarks the runtime of a PyTorch function. This function handles: - L2 cache flushing between runs for consistent timing - Automatic warmup and repeat count calculation - Optional gradient clearing for backward passes - Multiple measurement modes (mean, median, min, max) :param fn: Function to benchmark :param warmup: Target warmup time in milliseconds :param rep: Target number of repetitions :param _n_warmup: Override for number of warmup iterations :param _n_repeat: Override for number of timing iterations :param grad_to_none: Tensors whose gradients should be cleared between runs :param quantiles: Optional performance percentiles to compute :param fast_flush: Whether to use faster L2 cache flushing :param return_mode: How to aggregate timing results ("mean", "median", "min", "max") :returns: Aggregated runtime in milliseconds :rtype: float