LayoutTensor λ²μ
κ°μ
2D LayoutTensor aμ κ° μμΉμ 10μ λν΄ 2D LayoutTensor outputμ μ μ₯νλ 컀λμ ꡬνν΄ λ³΄μΈμ.
μ°Έκ³ : μ€λ λ μκ° νλ ¬μ μμΉ μλ³΄λ€ λ§μ΅λλ€.
ν΅μ¬ κ°λ
μ΄ νΌμ¦μμ λ°°μΈ λ΄μ©:
- 2D λ°°μ΄ μ κ·Όμ
LayoutTensorμ¬μ©νκΈ° tensor[i, j]λ‘ μ§μ 2D μΈλ±μ±νκΈ°LayoutTensorμμ κ²½κ³ κ²μ¬ μ²λ¦¬νκΈ°
ν΅μ¬μ LayoutTensorκ° μμ°μ€λ¬μ΄ 2D μΈλ±μ± μΈν°νμ΄μ€λ₯Ό μ 곡νμ¬ λ΄λΆ λ©λͺ¨λ¦¬ λ μ΄μμμ μΆμννλ€λ μ μ
λλ€. κ·Έλ¬λ©΄μλ κ²½κ³ κ²μ¬λ μ¬μ ν νμν©λλ€.
- 2D μ κ·Ό:
LayoutTensorλ‘ μμ°μ€λ¬μ΄ \((i,j)\) μΈλ±μ± - λ©λͺ¨λ¦¬ μΆμν: μλ ν μ°μ κ³μ° λΆνμ
- κ°λ 쑰건: λ μ°¨μ λͺ¨λ κ²½κ³ κ²μ¬ νμ
- μ€λ λ λ²μ: μ€λ λ \((3 \times 3)\)κ° ν μ μμ \((2 \times 2)\)λ³΄λ€ λ§μ
μμ±ν μ½λ
comptime SIZE = 2
comptime BLOCKS_PER_GRID = 1
comptime THREADS_PER_BLOCK = (3, 3)
comptime dtype = DType.float32
comptime layout = Layout.row_major(SIZE, SIZE)
fn add_10_2d(
output: LayoutTensor[dtype, layout, MutAnyOrigin],
a: LayoutTensor[dtype, layout, MutAnyOrigin],
size: UInt,
):
row = thread_idx.y
col = thread_idx.x
# FILL ME IN (roughly 2 lines)
μ 체 μ½λ 보기: problems/p04/p04_layout_tensor.mojo
ν
- 2D μΈλ±μ€ κ°μ Έμ€κΈ°:
row = thread_idx.y,col = thread_idx.x - κ°λ μΆκ°:
if row < size and col < size - κ°λ λ΄λΆμμ
a[row, col]μ 10 λνκΈ°
μ½λ μ€ν
μ루μ μ ν μ€νΈνλ €λ©΄ ν°λ―Έλμμ λ€μ λͺ λ Ήμ΄λ₯Ό μ€ννμΈμ:
pixi run p04_layout_tensor
pixi run -e amd p04_layout_tensor
pixi run -e apple p04_layout_tensor
uv run poe p04_layout_tensor
νΌμ¦μ μμ§ νμ§ μμλ€λ©΄ μΆλ ₯μ΄ λ€μκ³Ό κ°μ΄ λνλ©λλ€:
out: HostBuffer([0.0, 0.0, 0.0, 0.0])
expected: HostBuffer([10.0, 11.0, 12.0, 13.0])
μ루μ
fn add_10_2d(
output: LayoutTensor[dtype, layout, MutAnyOrigin],
a: LayoutTensor[dtype, layout, MutAnyOrigin],
size: UInt,
):
row = thread_idx.y
col = thread_idx.x
if col < size and row < size:
output[row, col] = a[row, col] + 10.0
μ΄ μ루μ μ:
row = thread_idx.y,col = thread_idx.xλ‘ 2D μ€λ λ μΈλ±μ€λ₯Ό κ°μ Έμ΄if row < size and col < sizeλ‘ λ²μλ₯Ό λ²μ΄λ μ κ·Ό λ°©μ§LayoutTensorμ 2D μΈλ±μ± μ¬μ©:output[row, col] = a[row, col] + 10.0