tilelang.contrib.cutedsl¶

Submodules¶

Attributes¶

Functions¶

make_filled_tensor(shape, value)

make_tensor_at_offset(ptr, offset, shape[, div_by])

shuffle_elect(thread_extent)

sync_thread_partial([barrier_id, thread_count])

pack_half2(x, y)

Pack two half-precision (fp16) values into a single 32-bit value.

AtomicAdd(ptr, value, *[, loc, ip])

Package Contents¶

tilelang.contrib.cutedsl.BYTES_PER_TENSORMAP = 128¶
tilelang.contrib.cutedsl.BYTES_PER_POINTER = 8¶
tilelang.contrib.cutedsl.make_filled_tensor(shape, value)¶
tilelang.contrib.cutedsl.make_tensor_at_offset(ptr, offset, shape, div_by=1)¶
Parameters:

ptr (threadblock_swizzle.cute.Pointer)

tilelang.contrib.cutedsl.shuffle_elect(thread_extent)¶
tilelang.contrib.cutedsl.sync_thread_partial(barrier_id=None, thread_count=None)¶
tilelang.contrib.cutedsl.pack_half2(x, y)¶

Pack two half-precision (fp16) values into a single 32-bit value. Corresponds to CUDA’s __pack_half2 intrinsic.

This packs two fp16 values into a single int32 by treating the fp16 bits as raw data and concatenating them.

tilelang.contrib.cutedsl.AtomicAdd(ptr, value, *, loc=None, ip=None)¶
Parameters:
  • ptr (threadblock_swizzle.cute.Pointer)

  • value (math.Numeric)