tilelang.contrib.cutedsl.atomic¶
Atomic operations for CuTeDSL backend.
This module provides implementations of atomic operations using NVVM and LLVM dialects.
Functions¶
|
Perform atomic addition on a pointer. |
|
Perform atomic addition and return the previous value. |
|
Vectorized atomic add for 2 consecutive elements. |
|
Vectorized atomic add for 4 consecutive float32 elements. |
|
Perform atomic maximum operation. |
|
Perform atomic maximum and return the previous value. |
|
Perform atomic minimum operation. |
|
Perform atomic minimum and return the previous value. |
|
Perform atomic load with specified memory ordering. |
|
Perform atomic store with specified memory ordering. |
Module Contents¶
- tilelang.contrib.cutedsl.atomic.AtomicAdd(ptr, value, *, loc=None, ip=None)¶
Perform atomic addition on a pointer.
Supports float16, float32, int32, and int64 types. Returns the old value before addition (atomicrmw semantics).
- Parameters:
ptr (cutlass.cute.Pointer)
value (Numeric)
- tilelang.contrib.cutedsl.atomic.AtomicAddRet(ptr, value, *, loc=None, ip=None)¶
Perform atomic addition and return the previous value.
This is the same as AtomicAdd since nvvm.atomicrmw always returns old value.
- Parameters:
ptr (cutlass.cute.Pointer)
value (Numeric)
- tilelang.contrib.cutedsl.atomic.AtomicAddx2(dst_ptr, src_values, *, loc=None, ip=None)¶
Vectorized atomic add for 2 consecutive elements.
Uses PTX atom.add.v2.f32 for float32 or atom.add.noftz.v2.f16 for float16.
- Parameters:
dst_ptr (cutlass.cute.Pointer) – Pointer to destination (2 consecutive elements)
src_values – Source values - can be TensorSSA (loaded tensor) or Pointer
- tilelang.contrib.cutedsl.atomic.AtomicAddx4(dst_ptr, src_values, *, loc=None, ip=None)¶
Vectorized atomic add for 4 consecutive float32 elements.
Uses PTX atom.global.add.v4.f32 for true vectorized atomic operation on SM90+.
- Parameters:
dst_ptr (cutlass.cute.Pointer) – Pointer to destination (4 consecutive float32 elements)
src_values – Source values - can be TensorSSA (loaded tensor) or Pointer
- tilelang.contrib.cutedsl.atomic.AtomicMax(ptr, value, *, loc=None, ip=None)¶
Perform atomic maximum operation.
For integers, uses nvvm.atomicrmw with MAX. For floats, uses CAS loop since PTX doesn’t have atomic max for float32.
- Parameters:
ptr (cutlass.cute.Pointer)
value (Numeric)
- tilelang.contrib.cutedsl.atomic.AtomicMaxRet(ptr, value, *, loc=None, ip=None)¶
Perform atomic maximum and return the previous value.
- Parameters:
ptr (cutlass.cute.Pointer)
value (Numeric)
- tilelang.contrib.cutedsl.atomic.AtomicMin(ptr, value, *, loc=None, ip=None)¶
Perform atomic minimum operation.
For integers, uses nvvm.atomicrmw with MIN. For floats, uses CAS loop since PTX doesn’t have atomic min for float32.
- Parameters:
ptr (cutlass.cute.Pointer)
value (Numeric)
- tilelang.contrib.cutedsl.atomic.AtomicMinRet(ptr, value, *, loc=None, ip=None)¶
Perform atomic minimum and return the previous value.
- Parameters:
ptr (cutlass.cute.Pointer)
value (Numeric)
- tilelang.contrib.cutedsl.atomic.AtomicLoad(ptr, memory_order, *, loc=None, ip=None)¶
Perform atomic load with specified memory ordering.
- Parameters:
ptr (cutlass.cute.Pointer) – Pointer to load from
memory_order (int) – TileLang memory order ID (0=relaxed, 2=acquire, 5=seq_cst, etc.)
- Returns:
The loaded value
- PTX mapping (per NVIDIA ABI):
relaxed: ld.relaxed.<scope> acquire: ld.acquire.<scope> seq_cst: fence.sc.<scope>; ld.relaxed.<scope>
- tilelang.contrib.cutedsl.atomic.AtomicStore(ptr, value, memory_order, *, loc=None, ip=None)¶
Perform atomic store with specified memory ordering.
- Parameters:
ptr (cutlass.cute.Pointer) – Pointer to store to
value (Numeric) – Value to store
memory_order (int) – TileLang memory order ID (0=relaxed, 3=release, 5=seq_cst, etc.)
- PTX mapping (per NVIDIA ABI):
relaxed: st.relaxed.<scope> release: st.release.<scope> seq_cst: fence.sc.<scope>; st.relaxed.<scope>