tilelang.language.allocate

Memory allocation utilities for Tile-AI programs.

This module provides a set of functions for allocating different types of memory buffers in Tile-AI programs. It wraps TVM’s buffer allocation functionality with convenient interfaces for different memory scopes.

Available allocation functions:
  • alloc_shared: Allocates shared memory buffers for inter-thread communication

  • alloc_local: Allocates local memory buffers for thread-private storage

  • alloc_fragment: Allocates fragment memory buffers for specialized operations

  • alloc_var: Allocates single-element variable buffers

Each function takes shape and dtype parameters and returns a TVM buffer object with the appropriate memory scope.

Functions

alloc_shared(shape, dtype[, scope])

Allocate a shared memory buffer for inter-thread communication.

alloc_local(shape, dtype[, scope])

Allocate a local memory buffer for thread-private storage.

alloc_fragment(shape, dtype[, scope])

Allocate a fragment memory buffer for specialized operations.

alloc_var(…)

Allocate a single-element variable buffer.

alloc_barrier(arrive_count)

Allocate a barrier buffer.

alloc_tmem(shape, dtype)

Allocate a Tensor Memory (TMEM) buffer for use with 5th generation Tensor Core operations (e.g., TCGEN5.MMA).

alloc_reducer(shape, dtype[, op, replication])

Allocate a reducer buffer.

alloc_descriptor([dtype, scope])

Allocate a descriptor buffer for wgmma and utcmma.

Module Contents

tilelang.language.allocate.alloc_shared(shape, dtype, scope='shared.dyn')

Allocate a shared memory buffer for inter-thread communication.

Parameters:
  • shape (tuple) – The shape of the buffer to allocate

  • dtype (str) – The data type of the buffer (e.g., ‘float32’, ‘int32’)

  • scope (str, optional) – The memory scope. Defaults to “shared.dyn”

Returns:

A TVM buffer object allocated in shared memory

Return type:

T.Buffer

tilelang.language.allocate.alloc_local(shape, dtype, scope='local')

Allocate a local memory buffer for thread-private storage.

Parameters:
  • shape (tuple) – The shape of the buffer to allocate

  • dtype (str) – The data type of the buffer (e.g., ‘float32’, ‘int32’)

  • scope (str, optional) – The memory scope. Defaults to “local”

Returns:

A TVM buffer object allocated in local memory

Return type:

T.Buffer

tilelang.language.allocate.alloc_fragment(shape, dtype, scope='local.fragment')

Allocate a fragment memory buffer for specialized operations.

Parameters:
  • shape (tuple) – The shape of the buffer to allocate

  • dtype (str) – The data type of the buffer (e.g., ‘float32’, ‘int32’)

  • scope (str, optional) – The memory scope. Defaults to “local.fragment”

Returns:

A TVM buffer object allocated in fragment memory

Return type:

T.Buffer

tilelang.language.allocate.alloc_var(dtype: str, init: tvm.tir.PrimExpr | int | float, scope: str = 'local.var') tvm.tir.buffer.Buffer
tilelang.language.allocate.alloc_var(dtype: str, scope: str = 'local.var', *, init: tvm.tir.PrimExpr | int | float | None = None) tvm.tir.buffer.Buffer

Allocate a single-element variable buffer.

Parameters:
  • dtype (str) – The data type of the buffer (e.g., ‘float32’, ‘int32’)

  • *args – Optional positional arguments. A single positional string is treated as the scope for backward compatibility. A single non-string positional argument (or keyword init) specifies the initializer. When two positional arguments are provided, they are interpreted as (init, scope).

  • scope (str, optional) – The memory scope. Defaults to “local.var”. Use as keyword argument for clarity when also providing an initializer.

  • init (PrimExpr, optional) – The optional initializer value. When provided, the generated code will initialize the variable with this value instead of defaulting to zero.

Examples

a = T.alloc_var(‘int32’, 1) # var with init 1 a = T.alloc_var(‘int32’, ‘local.var’) # var with local.var scope a = T.alloc_var(‘int32’, 1, ‘local.var’) # var with init 1 and local.var scope a = T.alloc_var(‘int32’, ‘local.var’, init=1) # var with init 1 and local.var scope a = T.alloc_var(‘int32’, init=1) # var with init 1 and local.var scope

Returns:

A TVM buffer object allocated as a single-element variable

Return type:

T.Buffer

tilelang.language.allocate.alloc_barrier(arrive_count)

Allocate a barrier buffer.

Parameters:

arrive_count (int) – The number of threads that need to arrive at the barrier

Returns:

A TVM buffer object allocated as a barrier

Return type:

T.Buffer

tilelang.language.allocate.alloc_tmem(shape, dtype)

Allocate a Tensor Memory (TMEM) buffer for use with 5th generation Tensor Core operations (e.g., TCGEN5.MMA).

TMEM is a dedicated on-chip memory introduced in Hopper GPUs, designed to reduce register pressure and enable asynchronous, single-threaded MMA operations. It is organized as a 2D array of 512 columns by 128 rows (lanes), with each cell being 32 bits. Allocation is performed in units of columns, and every lane of a column is allocated together.

Key properties and requirements:
  • The number of columns allocated must be a power of 2 and at least 32.

  • TMEM allocations are dynamic and must be explicitly deallocated.

  • Both allocation and deallocation must be performed by the same warp.

  • The base address of the TMEM allocation is stored in shared memory and used as the offset for TCGEN5.MMA accumulator tensors.

  • Only TCGEN5.MMA and specific TMEM load/store instructions can access TMEM; all pre-processing must occur before data is loaded into TMEM, and all post-processing after data is retrieved.

  • The number of columns allocated should not increase between any two allocations in the execution order within the CTA.

Parameters:

num_cols (int) – Number of columns to allocate in TMEM. Must be a power of 2 and >= 32 but less than or equal to 512.

Returns:

A TVM buffer object allocated in TMEM scope, suitable for use as an accumulator or operand in TCGEN5.MMA operations.

Return type:

T.Buffer

Note

  • TMEM is only available on supported architectures (e.g., Hopper and later).

  • The buffer returned should be used according to TMEM access restrictions and deallocated appropriately.

tilelang.language.allocate.alloc_reducer(shape, dtype, op='sum', replication=None)

Allocate a reducer buffer.

Modifications needs to conform with op, such as op=”sum” requires reducer[…] += … and op=”max” requires reducer[…] = T.max(reducer[…], …).

Only after T.fill with proper initializer the reduction may begin; only after T.finalize_reducer the partial results will be available.

For op=”sum”, filled value must be 0; for min and max, the filled initializer will become max or min clamper correspondingly. You may want to use T.max_value for min and T.min_value for max.

Parameters:
  • shape (tuple) – The shape of the buffer to allocate

  • dtype (str) – The data type of the buffer (e.g., ‘float32’, ‘int32’)

  • op (str) – The reduce operation corresponded with the reducer

  • replication (str | None) – Replication strategy, can be “all” or “none”. Defaults to not specified, and the compiler will do whatever it want.

Returns:

A TVM buffer object allocated in thread-private storage, available to reduce values in T.Parallel loops.

Return type:

T.Buffer

tilelang.language.allocate.alloc_descriptor(dtype='uint64', scope='local.descriptor')

Allocate a descriptor buffer for wgmma and utcmma.

Returns:

A TVM buffer object allocated as a descriptor

Return type:

T.Buffer