LayoutTensor 버전

κ°œμš”

2D LayoutTensor a의 각 μœ„μΉ˜μ— 10을 더해 2D LayoutTensor output에 μ €μž₯ν•˜λŠ” 컀널을 κ΅¬ν˜„ν•΄ λ³΄μ„Έμš”.

μ°Έκ³ : μŠ€λ ˆλ“œ μˆ˜κ°€ ν–‰λ ¬μ˜ μœ„μΉ˜ μˆ˜λ³΄λ‹€ λ§ŽμŠ΅λ‹ˆλ‹€.

핡심 κ°œλ…

이 νΌμ¦μ—μ„œ 배울 λ‚΄μš©:

  • 2D λ°°μ—΄ 접근에 LayoutTensor μ‚¬μš©ν•˜κΈ°
  • tensor[i, j]둜 직접 2D μΈλ±μ‹±ν•˜κΈ°
  • LayoutTensorμ—μ„œ 경계 검사 μ²˜λ¦¬ν•˜κΈ°

핡심은 LayoutTensorκ°€ μžμ—°μŠ€λŸ¬μš΄ 2D 인덱싱 μΈν„°νŽ˜μ΄μŠ€λ₯Ό μ œκ³΅ν•˜μ—¬ λ‚΄λΆ€ λ©”λͺ¨λ¦¬ λ ˆμ΄μ•„μ›ƒμ„ μΆ”μƒν™”ν•œλ‹€λŠ” μ μž…λ‹ˆλ‹€. κ·ΈλŸ¬λ©΄μ„œλ„ 경계 κ²€μ‚¬λŠ” μ—¬μ „νžˆ ν•„μš”ν•©λ‹ˆλ‹€.

  • 2D μ ‘κ·Ό: LayoutTensor둜 μžμ—°μŠ€λŸ¬μš΄ \((i,j)\) 인덱싱
  • λ©”λͺ¨λ¦¬ 좔상화: μˆ˜λ™ ν–‰ μš°μ„  계산 λΆˆν•„μš”
  • κ°€λ“œ 쑰건: 두 차원 λͺ¨λ‘ 경계 검사 ν•„μš”
  • μŠ€λ ˆλ“œ λ²”μœ„: μŠ€λ ˆλ“œ \((3 \times 3)\)κ°€ ν…μ„œ μ›μ†Œ \((2 \times 2)\)보닀 많음

μ™„μ„±ν•  μ½”λ“œ

comptime SIZE = 2
comptime BLOCKS_PER_GRID = 1
comptime THREADS_PER_BLOCK = (3, 3)
comptime dtype = DType.float32
comptime layout = Layout.row_major(SIZE, SIZE)


fn add_10_2d(
    output: LayoutTensor[dtype, layout, MutAnyOrigin],
    a: LayoutTensor[dtype, layout, MutAnyOrigin],
    size: UInt,
):
    row = thread_idx.y
    col = thread_idx.x
    # FILL ME IN (roughly 2 lines)


전체 μ½”λ“œ 보기: problems/p04/p04_layout_tensor.mojo

팁
  1. 2D 인덱슀 κ°€μ Έμ˜€κΈ°: row = thread_idx.y, col = thread_idx.x
  2. κ°€λ“œ μΆ”κ°€: if row < size and col < size
  3. κ°€λ“œ λ‚΄λΆ€μ—μ„œ a[row, col]에 10 λ”ν•˜κΈ°

μ½”λ“œ μ‹€ν–‰

μ†”λ£¨μ…˜μ„ ν…ŒμŠ€νŠΈν•˜λ €λ©΄ ν„°λ―Έλ„μ—μ„œ λ‹€μŒ λͺ…λ Ήμ–΄λ₯Ό μ‹€ν–‰ν•˜μ„Έμš”:

pixi run p04_layout_tensor
pixi run -e amd p04_layout_tensor
pixi run -e apple p04_layout_tensor
uv run poe p04_layout_tensor

퍼즐을 아직 ν’€μ§€ μ•Šμ•˜λ‹€λ©΄ 좜λ ₯이 λ‹€μŒκ³Ό 같이 λ‚˜νƒ€λ‚©λ‹ˆλ‹€:

out: HostBuffer([0.0, 0.0, 0.0, 0.0])
expected: HostBuffer([10.0, 11.0, 12.0, 13.0])

μ†”λ£¨μ…˜

fn add_10_2d(
    output: LayoutTensor[dtype, layout, MutAnyOrigin],
    a: LayoutTensor[dtype, layout, MutAnyOrigin],
    size: UInt,
):
    row = thread_idx.y
    col = thread_idx.x
    if col < size and row < size:
        output[row, col] = a[row, col] + 10.0


이 μ†”λ£¨μ…˜μ€:

  • row = thread_idx.y, col = thread_idx.x둜 2D μŠ€λ ˆλ“œ 인덱슀λ₯Ό κ°€μ Έμ˜΄
  • if row < size and col < size둜 λ²”μœ„λ₯Ό λ²—μ–΄λ‚œ μ ‘κ·Ό λ°©μ§€
  • LayoutTensor의 2D 인덱싱 μ‚¬μš©: output[row, col] = a[row, col] + 10.0