tilelang.language.allocate¶
Memory allocation utilities for Tile-AI programs.
This module provides a set of functions for allocating different types of memory buffers in Tile-AI programs. It wraps TVM’s buffer allocation functionality with convenient interfaces for different memory scopes.
- Available allocation functions:
alloc_shared: Allocates shared memory buffers for inter-thread communication
alloc_local: Allocates local memory buffers for thread-private storage
alloc_fragment: Allocates fragment memory buffers for specialized operations
alloc_var: Allocates single-element variable buffers
Each function takes shape and dtype parameters and returns a TVM buffer object with the appropriate memory scope.
Functions¶
|
Allocate a shared memory buffer for inter-thread communication. |
|
Allocate a local memory buffer for thread-private storage. |
|
Allocate a fragment memory buffer for specialized operations. |
|
Allocate a single-element variable buffer. |
|
Allocate a barrier buffer. |
|
Allocate a Tensor Memory (TMEM) buffer for use with 5th generation Tensor Core operations (e.g., TCGEN5.MMA). |
|
Allocate a reducer buffer. |
|
Allocate a descriptor buffer for wgmma and utcmma. |
Module Contents¶
Allocate a shared memory buffer for inter-thread communication.
- Parameters:
shape (tuple) – The shape of the buffer to allocate
dtype (str) – The data type of the buffer (e.g., ‘float32’, ‘int32’)
scope (str, optional) – The memory scope. Defaults to “shared.dyn”
- Returns:
A TVM buffer object allocated in shared memory
- Return type:
T.Buffer
- tilelang.language.allocate.alloc_local(shape, dtype, scope='local')¶
Allocate a local memory buffer for thread-private storage.
- Parameters:
shape (tuple) – The shape of the buffer to allocate
dtype (str) – The data type of the buffer (e.g., ‘float32’, ‘int32’)
scope (str, optional) – The memory scope. Defaults to “local”
- Returns:
A TVM buffer object allocated in local memory
- Return type:
T.Buffer
- tilelang.language.allocate.alloc_fragment(shape, dtype, scope='local.fragment')¶
Allocate a fragment memory buffer for specialized operations.
- Parameters:
shape (tuple) – The shape of the buffer to allocate
dtype (str) – The data type of the buffer (e.g., ‘float32’, ‘int32’)
scope (str, optional) – The memory scope. Defaults to “local.fragment”
- Returns:
A TVM buffer object allocated in fragment memory
- Return type:
T.Buffer
- tilelang.language.allocate.alloc_var(dtype: str, init: tvm.tir.PrimExpr | int | float, scope: str = 'local.var') tvm.tir.buffer.Buffer¶
- tilelang.language.allocate.alloc_var(dtype: str, scope: str = 'local.var', *, init: tvm.tir.PrimExpr | int | float | None = None) tvm.tir.buffer.Buffer
Allocate a single-element variable buffer.
- Parameters:
dtype (str) – The data type of the buffer (e.g., ‘float32’, ‘int32’)
*args – Optional positional arguments. A single positional string is treated as the scope for backward compatibility. A single non-string positional argument (or keyword
init) specifies the initializer. When two positional arguments are provided, they are interpreted as(init, scope).scope (str, optional) – The memory scope. Defaults to “local.var”. Use as keyword argument for clarity when also providing an initializer.
init (PrimExpr, optional) – The optional initializer value. When provided, the generated code will initialize the variable with this value instead of defaulting to zero.
Examples
a = T.alloc_var(‘int32’, 1) # var with init 1 a = T.alloc_var(‘int32’, ‘local.var’) # var with local.var scope a = T.alloc_var(‘int32’, 1, ‘local.var’) # var with init 1 and local.var scope a = T.alloc_var(‘int32’, ‘local.var’, init=1) # var with init 1 and local.var scope a = T.alloc_var(‘int32’, init=1) # var with init 1 and local.var scope
- Returns:
A TVM buffer object allocated as a single-element variable
- Return type:
T.Buffer
- tilelang.language.allocate.alloc_barrier(arrive_count)¶
Allocate a barrier buffer.
- Parameters:
arrive_count (int) – The number of threads that need to arrive at the barrier
- Returns:
A TVM buffer object allocated as a barrier
- Return type:
T.Buffer
- tilelang.language.allocate.alloc_tmem(shape, dtype)¶
Allocate a Tensor Memory (TMEM) buffer for use with 5th generation Tensor Core operations (e.g., TCGEN5.MMA).
TMEM is a dedicated on-chip memory introduced in Hopper GPUs, designed to reduce register pressure and enable asynchronous, single-threaded MMA operations. It is organized as a 2D array of 512 columns by 128 rows (lanes), with each cell being 32 bits. Allocation is performed in units of columns, and every lane of a column is allocated together.
- Key properties and requirements:
The number of columns allocated must be a power of 2 and at least 32.
TMEM allocations are dynamic and must be explicitly deallocated.
Both allocation and deallocation must be performed by the same warp.
The base address of the TMEM allocation is stored in shared memory and used as the offset for TCGEN5.MMA accumulator tensors.
Only TCGEN5.MMA and specific TMEM load/store instructions can access TMEM; all pre-processing must occur before data is loaded into TMEM, and all post-processing after data is retrieved.
The number of columns allocated should not increase between any two allocations in the execution order within the CTA.
- Parameters:
num_cols (int) – Number of columns to allocate in TMEM. Must be a power of 2 and >= 32 but less than or equal to 512.
- Returns:
A TVM buffer object allocated in TMEM scope, suitable for use as an accumulator or operand in TCGEN5.MMA operations.
- Return type:
T.Buffer
Note
TMEM is only available on supported architectures (e.g., Hopper and later).
The buffer returned should be used according to TMEM access restrictions and deallocated appropriately.
- tilelang.language.allocate.alloc_reducer(shape, dtype, op='sum', replication=None)¶
Allocate a reducer buffer.
Modifications needs to conform with op, such as op=”sum” requires reducer[…] += … and op=”max” requires reducer[…] = T.max(reducer[…], …).
Only after T.fill with proper initializer the reduction may begin; only after T.finalize_reducer the partial results will be available.
For op=”sum”, filled value must be 0; for min and max, the filled initializer will become max or min clamper correspondingly. You may want to use T.max_value for min and T.min_value for max.
- Parameters:
shape (tuple) – The shape of the buffer to allocate
dtype (str) – The data type of the buffer (e.g., ‘float32’, ‘int32’)
op (str) – The reduce operation corresponded with the reducer
replication (str | None) – Replication strategy, can be “all” or “none”. Defaults to not specified, and the compiler will do whatever it want.
- Returns:
A TVM buffer object allocated in thread-private storage, available to reduce values in T.Parallel loops.
- Return type:
T.Buffer
- tilelang.language.allocate.alloc_descriptor(dtype='uint64', scope='local.descriptor')¶
Allocate a descriptor buffer for wgmma and utcmma.
- Returns:
A TVM buffer object allocated as a descriptor
- Return type:
T.Buffer