tilelang.language.kernel¶
Kernel launching language interface in TileLang.
Classes¶
A simple stack-like wrapper around a deque that provides |
|
KernelLaunchFrame is a custom TIRFrame that manages block/thread indices |
Functions¶
|
Tools to quickly construct a kernel launch frame. |
|
Construct a kernel launch frame with a CUDA thread block cluster |
|
Launch a kernel from CUDA source code or a CUDA source file. |
|
Returns the thread binding for the given dimension. |
Returns all three thread bindings. |
|
|
Returns the block binding for the given dimension. |
Returns all three block bindings. |
|
|
Returns the thread extent for the given dimension. |
Returns all three thread extents. |
|
|
Returns the block extent for the given dimension. |
Returns all three block extents. |
Module Contents¶
- class tilelang.language.kernel.FrameStack¶
A simple stack-like wrapper around a deque that provides push, pop, and top methods for convenience.
- push(item)¶
Pushes an item onto the top of the stack.
- pop()¶
Pops and returns the top of the stack, or returns None if the stack is empty.
- top()¶
Returns the item on the top of the stack without removing it, or None if the stack is empty.
- size()¶
Returns the number of items in the stack.
- __len__()¶
Returns the number of items in the stack.
- __bool__()¶
Allows truthy checks on the stack object itself, e.g., ‘if stack: …’
- class tilelang.language.kernel.KernelLaunchFrame¶
Bases:
tvm.tirx.script.builder.frame.TIRFrameKernelLaunchFrame is a custom TIRFrame that manages block/thread indices and handles the entry and exit of the kernel launch scope.
- __enter__()¶
Enters the KernelLaunchFrame scope and pushes this frame onto the stack. Returns one Var if we detect exactly 5 frames (meaning there is a single block dimension), or a list of Vars otherwise.
- Return type:
tvm.tirx.Var | list[tvm.tirx.Var]
- __exit__(ptype, value, trace)¶
Exits the KernelLaunchFrame scope and pops this frame from the stack, but only if it’s indeed the topmost frame.
- classmethod Current()¶
Returns the topmost (current) KernelLaunchFrame from the stack if it exists, or None if the stack is empty.
- Return type:
KernelLaunchFrame | None
- get_block_extent(dim)¶
Returns the block extent for the given dimension. dim=0 corresponds to blockIdx.x, dim=1 to blockIdx.y, and dim=2 to blockIdx.z.
- Parameters:
dim (int)
- Return type:
int
- get_block_extents()¶
Returns the block extents for all three dimensions.
- Return type:
list[int]
- get_thread_extent(dim)¶
Returns the thread extent for the given dimension. dim=0 corresponds to threadIdx.x, dim=1 to threadIdx.y, and dim=2 to threadIdx.z.
- Parameters:
dim (int)
- Return type:
int
- get_thread_extents()¶
Returns the thread extents for all three dimensions.
- Return type:
list[int]
- get_thread_binding(dim=0)¶
Returns the thread binding for the given dimension. dim=0 corresponds to threadIdx.x, dim=1 to threadIdx.y, and dim=2 to threadIdx.z.
- Parameters:
dim (int)
- Return type:
tvm.tirx.Var
- get_thread_bindings()¶
Returns the thread binding for the given dimension. dim=0 corresponds to threadIdx.x, dim=1 to threadIdx.y, and dim=2 to threadIdx.z.
- Return type:
list[tvm.tirx.Var]
- get_num_threads()¶
Returns the thread indices from the topmost frame.
- Return type:
int
- get_block_binding(dim=0)¶
Returns the block binding for the given dimension. dim=0 corresponds to blockIdx.x, dim=1 to blockIdx.y, and dim=2 to blockIdx.z.
- Parameters:
dim (int)
- Return type:
tvm.tirx.Var
- get_block_bindings()¶
Returns all three block bindings.
- Return type:
list[tvm.tirx.Var]
- property blocks: list[tvm.tirx.Var]¶
Returns the block indices from the topmost frame.
- Return type:
list[tvm.tirx.Var]
- property threads: list[tvm.tirx.Var]¶
Returns the thread indices from the topmost frame.
- Return type:
list[tvm.tirx.Var]
- property num_threads: int¶
Returns the total number of threads.
- Return type:
int
- tilelang.language.kernel.Kernel(*blocks, threads=None, prelude=None)¶
Tools to quickly construct a kernel launch frame.
The launch nest is emitted in a target-neutral form (thread_binding For loops); each backend pipeline materializes it via MaterializeKernelLaunch. Backends without SIMT (e.g. CPU) simply ignore the thread extents at compile time, so the same kernel can be compiled for any target.
- Parameters:
blocks (int) – A list of extent, can be 1-3 dimension, representing gridDim.(x|y|z)
threads (int) – A integer representing blockDim.x Or a list of integers representing blockDim.(x|y|z) if the value is -1, we skip the threadIdx.x binding.
prelude (str) – The import c code of the kernel, will be injected before the generated kernel code.
- Returns:
res – The result LaunchThreadFrame.
- Return type:
Tuple[frame.LaunchThreadFrame]
Examples
Create a 1-D CUDA kernel launch and unpack the single block index:
with T.Kernel(T.ceildiv(N, 128), threads=128) as bx: # bx is the blockIdx.x binding (also iterable as (bx,)) ...
Launch a 2-D grid while requesting two thread dimensions:
with T.Kernel(grid_x, grid_y, threads=(64, 2)) as (bx, by): tx, ty = T.get_thread_bindings() ...
- tilelang.language.kernel.ClusterKernel(*blocks, cluster_dims, threads=None, prelude=None)¶
Construct a kernel launch frame with a CUDA thread block cluster (SM90+ only).
This is the CUDA-specific variant of
Kernel(): identical launch semantics and bindings, plus acluster_dimsannotation. The kernel will be launched with cudaLaunchKernelEx using cudaLaunchAttributeClusterDimension.- Parameters:
blocks (int) – A list of extent, can be 1-3 dimension, representing gridDim.(x|y|z)
cluster_dims (int | tuple[int, int, int] | list[int]) – The cluster dimensions. For example, use 2 or (2, 1, 1) to create 2-CTA clusters.
threads (int) – A integer representing blockDim.x Or a list of integers representing blockDim.(x|y|z)
prelude (str) – The import c code of the kernel, will be injected before the generated kernel code.
Examples
with T.ClusterKernel(grid_x, grid_y, cluster_dims=2, threads=128) as (bx, by): ...
- tilelang.language.kernel.CUDASourceCodeKernel(*blocks, threads=None, source_code_or_path, entry_name='main_kernel', cluster_dims=None, prelude=None)¶
Launch a kernel from CUDA source code or a CUDA source file.
The code must follows the following rules: 1. The kernel source must be a valid CUDA kernel which can be correctly compiled under TileLang’s context. 2. The kernel source must either contains only one __global__ function as an entry, or have a __global__ entry function named main_kernel.
- Parameters:
source_code_or_path (str | os.PathLike[str]) – Inline CUDA source code, or a path to a CUDA source file. If the argument resolves to an existing file, the file contents are loaded. Otherwise it is treated as inline CUDA source code.
blocks (int) – A list of extent, can be 1-3 dimension, representing gridDim.(x|y|z)
entry_name (str | None) – Optional name of the __global__ CUDA entry function inside the provided source. When specified, TileLang launches that external CUDA entry directly.
threads (int) – A integer representing blockDim.x Or a list of integers representing blockDim.(x|y|z) if the value is -1, we skip the threadIdx.x binding.
cluster_dims (int | tuple[int, int, int] | list[int] | None) – The cluster dimensions for SM90+ cluster launch. For example, use 2 or (2, 1, 1) to create 2-CTA clusters. When specified, the kernel will be launched using cudaLaunchKernelEx with cudaLaunchAttributeClusterDimension.
prelude (str) – The import c code of the kernel, will be injected before the generated kernel code.
- Return type:
None
- tilelang.language.kernel.get_thread_binding(dim=0)¶
Returns the thread binding for the given dimension.
- Parameters:
dim (int)
- Return type:
tvm.tirx.Var
- tilelang.language.kernel.get_thread_bindings()¶
Returns all three thread bindings.
- Return type:
list[tvm.tirx.Var]
- tilelang.language.kernel.get_block_binding(dim=0)¶
Returns the block binding for the given dimension.
- Parameters:
dim (int)
- Return type:
tvm.tirx.Var
- tilelang.language.kernel.get_block_bindings()¶
Returns all three block bindings.
- Return type:
list[tvm.tirx.Var]
- tilelang.language.kernel.get_thread_extent(dim=0)¶
Returns the thread extent for the given dimension.
- Parameters:
dim (int)
- Return type:
int
- tilelang.language.kernel.get_thread_extents()¶
Returns all three thread extents.
- Return type:
list[int]
- tilelang.language.kernel.get_block_extent(dim=0)¶
Returns the block extent for the given dimension.
- Parameters:
dim (int)
- Return type:
int
- tilelang.language.kernel.get_block_extents()¶
Returns all three block extents.
- Return type:
list[int]