tilelang.profiler.bench
=======================

.. py:module:: tilelang.profiler.bench

.. autoapi-nested-parse::

   The profiler and convert to torch utils


Functions
---------

.. autoapisummary::

   tilelang.profiler.bench.do_bench


Module Contents
---------------

.. py:function:: do_bench(fn, warmup = 25, rep = 100, _n_warmup = 0, _n_repeat = 0, grad_to_none = None, quantiles = None, fast_flush = True, return_mode = 'mean')

   Benchmarks the runtime of a PyTorch function.

   This function handles:
   - L2 cache flushing between runs for consistent timing
   - Automatic warmup and repeat count calculation
   - Optional gradient clearing for backward passes
   - Multiple measurement modes (mean, median, min, max)

   :param fn: Function to benchmark
   :param warmup: Target warmup time in milliseconds
   :param rep: Target number of repetitions
   :param _n_warmup: Override for number of warmup iterations
   :param _n_repeat: Override for number of timing iterations
   :param grad_to_none: Tensors whose gradients should be cleared between runs
   :param quantiles: Optional performance percentiles to compute
   :param fast_flush: Whether to use faster L2 cache flushing
   :param return_mode: How to aggregate timing results ("mean", "median", "min", "max")

   :returns: Aggregated runtime in milliseconds
   :rtype: float