tilelang.contrib.cutedsl.warp¶
Warp-level primitives for CuTeDSL backend. Re-exports from cutlass.cute.arch with TileLang naming conventions.
Functions¶
|
Warp-level parallel reduction: sum across all 32 lanes. |
|
Warp-level parallel reduction: max across all 32 lanes. |
|
Warp-level parallel reduction: min across all 32 lanes. |
|
Warp-level parallel reduction: bitwise AND across all 32 lanes. |
|
Warp-level parallel reduction: bitwise OR across all 32 lanes. |
Module Contents¶
- tilelang.contrib.cutedsl.warp.warp_reduce_sum(value)¶
Warp-level parallel reduction: sum across all 32 lanes.
- tilelang.contrib.cutedsl.warp.warp_reduce_max(value)¶
Warp-level parallel reduction: max across all 32 lanes.
- tilelang.contrib.cutedsl.warp.warp_reduce_min(value)¶
Warp-level parallel reduction: min across all 32 lanes.
- tilelang.contrib.cutedsl.warp.warp_reduce_bitand(value)¶
Warp-level parallel reduction: bitwise AND across all 32 lanes.
- tilelang.contrib.cutedsl.warp.warp_reduce_bitor(value)¶
Warp-level parallel reduction: bitwise OR across all 32 lanes.