tilelang.language.copy_opΒΆ
Copy operations exposed on the TileLang language surface.
FunctionsΒΆ
|
Copy data between memory regions. |
|
Asynchronous copy primitive lowered through cp.async. |
|
TMA copy β issues arrive_and_expect_tx + tma_load, no wait. |
|
Perform im2col transformation for 2D convolution. |
Module ContentsΒΆ
- tilelang.language.copy_op.copy(src, dst, *, coalesced_width=None, disable_tma=False, eviction_policy=None, annotations=None, loop_layout=None)ΒΆ
Copy data between memory regions.
- Parameters:
src (Union[tir.Buffer, tir.BufferLoad, tir.BufferRegion]) β Source memory region
dst (Union[tir.Buffer, tir.BufferLoad, tir.BufferRegion]) β Destination memory region
coalesced_width (Optional[int], keyword-only) β Width for coalesced memory access. Defaults to None.
disable_tma (bool, keyword-only) β Whether to disable TMA acceleration. Defaults to False.
eviction_policy (Optional[str], keyword-only) β Cache eviction policy. Defaults to None.
annotations (Optional[dict], keyword-only) β Additional annotations dict. If provided, coalesced_width, disable_tma, and eviction_policy can also be specified here. Values in annotations take precedence over individual arguments.
loop_layout (Optional[Fragment], keyword-only) β A parallel loop layout hint for the SIMT copy (only valid for normal SIMT copy; incompatible with TMA/LDSM/STSM/TMem). When provided, it is attached to the outermost parallel loop generated by this copy.
- Raises:
TypeError β If copy extents cannot be deduced from arguments
- Returns:
A handle to the copy operation
- Return type:
tir.Call
Range handling notes: - Accepts Buffer/BufferRegion/BufferLoad on either side. Extents are
derived as follows: Buffer -> shape, BufferRegion -> [r.extent], BufferLoad -> extents from its inferred/encoded region.
Normally, we require the extents of both sides to be the same. If they differ, the copy instruction follows an internal rule to select one side as the base range and create iteration space. This may generate unexpected code. And if some dimensions are 1, unexpected errors may happen.
Small Optimization: If both src and dst are scalar BufferLoad without region extents, lowers to a direct store: dst[β¦] = src[β¦].
Syntactic Sugar: TileLang supports passing the head address of a buffer to represent the whole buffer if there are no ambiguity. For example, T.copy(A, A_shared[i, j]). To support this, we need some special shape checking. But remember currently we donβt support something like βbroadcastβ.
The finalized extents are encoded with tl.region via to_buffer_region and passed through to the backend; low-level loop construction and any scope-specific decisions happen during lowering.
- tilelang.language.copy_op.async_copy(src, dst, *, coalesced_width=None, annotations=None, loop_layout=None)ΒΆ
Asynchronous copy primitive lowered through cp.async.
This operator is intended for explicitly asynchronous global->shared copy. The backend enforces cp.async constraints and emits:
ptx_cp_async(β¦) + ptx_commit_group().
No wait is auto-inserted for T.async_copy; synchronization is explicit.
- Parameters:
src (Union[tir.Buffer, tir.BufferLoad, tir.BufferRegion]) β Source memory region
dst (Union[tir.Buffer, tir.BufferLoad, tir.BufferRegion]) β Destination memory region
coalesced_width (Optional[int], keyword-only) β Width for coalesced memory access. Defaults to None.
annotations (Optional[dict], keyword-only) β Additional annotations dict.
loop_layout (Optional[Fragment], keyword-only) β A parallel loop layout hint for the SIMT copy loop.
- Returns:
A handle to the async copy operation
- Return type:
tir.Call
- tilelang.language.copy_op.tma_copy(src, dst, *, barrier, eviction_policy=None, annotations=None)ΒΆ
TMA copy β issues arrive_and_expect_tx + tma_load, no wait.
Unlike T.copy() which emits a full synchronous TMA sequence (arrive + load + wait), T.tma_copy() emits only the producer part (arrive_and_expect_tx + tma_load). The user manages synchronization explicitly via T.mbarrier_wait_parity().
- Parameters:
src (tilelang._typing.BufferLikeType) β Source memory region (global or shared)
dst (tilelang._typing.BufferLikeType) β Destination memory region (shared or global)
barrier β Mbarrier (from T.alloc_barrier()) for TMA synchronization. The TMA load will arrive at this barrier with expected byte count. The user must wait on the same barrier via T.mbarrier_wait_parity().
eviction_policy (Literal['evict_normal', 'evict_first', 'evict_last'] | None) β Cache eviction policy. Defaults to None.
annotations (dict | None) β Additional annotations dict. Values in annotations take precedence over individual arguments.
- Returns:
A handle to the tma_copy operation
- Return type:
tir.Call
- tilelang.language.copy_op.c2d_im2col(img, col, nhw_step, c_step, kernel, stride, dilation, pad, eviction_policy=None)ΒΆ
Perform im2col transformation for 2D convolution.
- Parameters:
img (tir.Buffer) β Input image buffer
col (tir.Buffer) β Output column buffer
nhw_step (tir.PrimExpr) β Step size for batch and spatial dimensions
c_step (tir.PrimExpr) β Step size for channel dimension
kernel (int) β Kernel size
stride (int) β Stride of the convolution
dilation (int) β Dilation rate
pad (int) β Padding size
eviction_policy (Literal['evict_normal', 'evict_first', 'evict_last'] | None)
- Returns:
A handle to the im2col operation
- Return type:
tir.Call