tilelang.language.annotations¶

Annotation helpers exposed on the TileLang language surface.

Functions¶

use_swizzle(panel_size[, order, enable])

Annotate a kernel to use a specific threadblock swizzle pattern.

annotate_layout(layout_map)

Annotate the layout of the buffer.

annotate_safe_value(safe_value_map)

Annotate the safe value of the buffer.

annotate_l2_hit_ratio(l2_hit_ratio_map)

Annotate the L2 hit ratio of the buffer.

annotate_min_blocks_per_sm(n)

Annotate the minimum number of thread blocks per SM (multiprocessor).

annotate_restrict_buffers(*buffers)

Mark the given buffer parameters as non-restrict.

Module Contents¶

tilelang.language.annotations.use_swizzle(panel_size, order='row', enable=True)¶

Annotate a kernel to use a specific threadblock swizzle pattern.

Parameters:
  • panel_size (int)

  • order (str)

  • enable (bool)

tilelang.language.annotations.annotate_layout(layout_map)¶

Annotate the layout of the buffer.

Parameters:

layout_map (dict)

tilelang.language.annotations.annotate_safe_value(safe_value_map)¶

Annotate the safe value of the buffer.

Parameters:

safe_value_map (dict)

tilelang.language.annotations.annotate_l2_hit_ratio(l2_hit_ratio_map)¶

Annotate the L2 hit ratio of the buffer.

Parameters:

l2_hit_ratio_map (dict)

tilelang.language.annotations.annotate_min_blocks_per_sm(n)¶

Annotate the minimum number of thread blocks per SM (multiprocessor).

When set, this value is passed as the second argument of __launch_bounds__(maxThreadsPerBlock, minBlocksPerMultiprocessor) in the generated CUDA kernel. A larger value hints the compiler to limit register usage so that more blocks can reside on each SM simultaneously, which can improve occupancy at the cost of potentially more register spilling.

Example

>>> @T.prim_func
... def my_kernel(...):
...     with T.Kernel(...):
            T.annotate_min_blocks_per_sm(2)
...         ...
Parameters:

n (int)

tilelang.language.annotations.annotate_restrict_buffers(*buffers)¶

Mark the given buffer parameters as non-restrict.

This annotation tells codegen to omit the __restrict__ qualifier for the specified kernel buffer parameters. Use this when two (or more) buffers may alias, for example overlapping slices from the same base tensor.

Example

>>> @T.prim_func
... def buggy_kernel(x: T.Tensor((N,), T.float32),
...                  y: T.Tensor((N,), T.float32)):
...     T.annotate_restrict_buffers(x, y)
...     with T.Kernel(N, threads=32) as pid:
...         y[pid] = x[pid] + 1