tilelang.contrib.cutedsl.utils¶
Utility functions for CuTeDSL backend.
Provides common helpers used across the CuTeDSL codegen: bitcast, tensor construction, warp election, barrier sync, and FP16 packing.
Attributes¶
Functions¶
|
Reinterpret the bits of a value as a different type. |
|
|
|
|
|
|
|
|
|
Pack two half-precision (fp16) values into a single 32-bit value. |
Module Contents¶
- tilelang.contrib.cutedsl.utils.BYTES_PER_TENSORMAP = 128¶
- tilelang.contrib.cutedsl.utils.BYTES_PER_POINTER = 8¶
- tilelang.contrib.cutedsl.utils.type_map¶
- tilelang.contrib.cutedsl.utils.bitcast(value, target_dtype)¶
Reinterpret the bits of a value as a different type. Equivalent to C’s (*(target_type *)(&value)).
- Parameters:
value – Source value (Numeric type from CuTeDSL)
target_dtype – Target type (CuTeDSL type like Int8, Float16, etc.)
- Returns:
Value reinterpreted as target type
- tilelang.contrib.cutedsl.utils.make_filled_tensor(shape, value)¶
- tilelang.contrib.cutedsl.utils.make_tensor_at_offset(ptr, offset, shape, div_by=1)¶
- Parameters:
ptr (cutlass.cute.Pointer)
- tilelang.contrib.cutedsl.utils.shuffle_elect(thread_extent)¶
- tilelang.contrib.cutedsl.utils.sync_thread_partial(barrier_id=None, thread_count=None)¶
- tilelang.contrib.cutedsl.utils.pack_half2(x, y)¶
Pack two half-precision (fp16) values into a single 32-bit value. Corresponds to CUDA’s __pack_half2 intrinsic.
This packs two fp16 values into a single int32 by treating the fp16 bits as raw data and concatenating them.