LayoutTensor ๋ฒ„์ „

๊ฐœ์š”

2D LayoutTensor a์˜ ๊ฐ ์œ„์น˜์— 10์„ ๋”ํ•ด 2D LayoutTensor output์— ์ €์žฅํ•˜๋Š” ์ปค๋„์„ ๊ตฌํ˜„ํ•ด ๋ณด์„ธ์š”.

์ฐธ๊ณ : ๋ธ”๋ก๋‹น ์Šค๋ ˆ๋“œ ์ˆ˜๊ฐ€ a์˜ ํ–‰๊ณผ ์—ด ํฌ๊ธฐ๋ณด๋‹ค ๋ชจ๋‘ ์ž‘์Šต๋‹ˆ๋‹ค.

ํ•ต์‹ฌ ๊ฐœ๋…

์ด ํผ์ฆ์—์„œ ๋ฐฐ์šธ ๋‚ด์šฉ:

  • ์—ฌ๋Ÿฌ ๋ธ”๋ก๊ณผ ํ•จ๊ป˜ LayoutTensor ์‚ฌ์šฉํ•˜๊ธฐ
  • 2D ๋ธ”๋ก ๊ตฌ์„ฑ์œผ๋กœ ํฐ ํ–‰๋ ฌ ์ฒ˜๋ฆฌํ•˜๊ธฐ
  • ๋ธ”๋ก ์ธ๋ฑ์‹ฑ๊ณผ LayoutTensor ์ ‘๊ทผ ๊ฒฐํ•ฉํ•˜๊ธฐ

ํ•ต์‹ฌ์€ LayoutTensor๊ฐ€ 2D ์ธ๋ฑ์‹ฑ์„ ๋‹จ์ˆœํ™”ํ•ด ์ฃผ์ง€๋งŒ, ํฐ ํ–‰๋ ฌ์—์„œ๋Š” ์—ฌ์ „ํžˆ ๋ธ”๋ก ๊ฐ„ ์กฐ์œจ์ด ํ•„์š”ํ•˜๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค.

๊ตฌ์„ฑ

  • ํ–‰๋ ฌ ํฌ๊ธฐ: \(5 \times 5\) ์›์†Œ
  • ๋ ˆ์ด์•„์›ƒ ์ฒ˜๋ฆฌ: LayoutTensor๊ฐ€ ํ–‰ ์šฐ์„  ๊ตฌ์„ฑ ๊ด€๋ฆฌ
  • ๋ธ”๋ก ์กฐ์œจ: ์—ฌ๋Ÿฌ ๋ธ”๋ก์œผ๋กœ ์ „์ฒด ํ–‰๋ ฌ ์ปค๋ฒ„
  • 2D ์ธ๋ฑ์‹ฑ: ๊ฒฝ๊ณ„ ๊ฒ€์‚ฌ์™€ ํ•จ๊ป˜ ์ž์—ฐ์Šค๋Ÿฌ์šด \((i,j)\) ์ ‘๊ทผ
  • ์ด ์Šค๋ ˆ๋“œ ์ˆ˜: \(25\)๊ฐœ ์›์†Œ์— ๋Œ€ํ•ด \(36\)๊ฐœ
  • ์Šค๋ ˆ๋“œ ๋งคํ•‘: ๊ฐ ์Šค๋ ˆ๋“œ๊ฐ€ ํ–‰๋ ฌ ์›์†Œ ํ•˜๋‚˜์”ฉ ์ฒ˜๋ฆฌ

์™„์„ฑํ•  ์ฝ”๋“œ

comptime SIZE = 5
comptime BLOCKS_PER_GRID = (2, 2)
comptime THREADS_PER_BLOCK = (3, 3)
comptime dtype = DType.float32
comptime out_layout = Layout.row_major(SIZE, SIZE)
comptime a_layout = Layout.row_major(SIZE, SIZE)


fn add_10_blocks_2d[
    out_layout: Layout,
    a_layout: Layout,
](
    output: LayoutTensor[dtype, out_layout, MutAnyOrigin],
    a: LayoutTensor[dtype, a_layout, ImmutAnyOrigin],
    size: UInt,
):
    row = block_dim.y * block_idx.y + thread_idx.y
    col = block_dim.x * block_idx.x + thread_idx.x
    # FILL ME IN (roughly 2 lines)


์ „์ฒด ์ฝ”๋“œ ๋ณด๊ธฐ: problems/p07/p07_layout_tensor.mojo

ํŒ
  1. ์ „์—ญ ์ธ๋ฑ์Šค ๊ณ„์‚ฐ: row = block_dim.y * block_idx.y + thread_idx.y, col = block_dim.x * block_idx.x + thread_idx.x
  2. ๊ฐ€๋“œ ์ถ”๊ฐ€: if row < size and col < size
  3. ๊ฐ€๋“œ ๋‚ด๋ถ€: 2D LayoutTensor์— 10์„ ๋”ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ƒ๊ฐํ•ด ๋ณด์„ธ์š”

์ฝ”๋“œ ์‹คํ–‰

์†”๋ฃจ์…˜์„ ํ…Œ์ŠคํŠธํ•˜๋ ค๋ฉด ํ„ฐ๋ฏธ๋„์—์„œ ๋‹ค์Œ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•˜์„ธ์š”:

pixi run p07_layout_tensor
pixi run -e amd p07_layout_tensor
pixi run -e apple p07_layout_tensor
uv run poe p07_layout_tensor

ํผ์ฆ์„ ์•„์ง ํ’€์ง€ ์•Š์•˜๋‹ค๋ฉด ์ถœ๋ ฅ์ด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋‚˜ํƒ€๋‚ฉ๋‹ˆ๋‹ค:

out: HostBuffer([0.0, 0.0, 0.0, ... , 0.0])
expected: HostBuffer([10.0, 11.0, 12.0, ... , 34.0])

์†”๋ฃจ์…˜

fn add_10_blocks_2d[
    out_layout: Layout,
    a_layout: Layout,
](
    output: LayoutTensor[dtype, out_layout, MutAnyOrigin],
    a: LayoutTensor[dtype, a_layout, ImmutAnyOrigin],
    size: UInt,
):
    row = block_dim.y * block_idx.y + thread_idx.y
    col = block_dim.x * block_idx.x + thread_idx.x
    if row < size and col < size:
        output[row, col] = a[row, col] + 10.0


LayoutTensor๊ฐ€ 2D ๋ธ”๋ก ๊ธฐ๋ฐ˜ ์ฒ˜๋ฆฌ๋ฅผ ์–ผ๋งˆ๋‚˜ ๊ฐ„์†Œํ™”ํ•˜๋Š”์ง€ ๋ณด์—ฌ์ฃผ๋Š” ์†”๋ฃจ์…˜์ž…๋‹ˆ๋‹ค:

  1. 2D ์Šค๋ ˆ๋“œ ์ธ๋ฑ์‹ฑ

    • ์ „์—ญ ํ–‰(row): block_dim.y * block_idx.y + thread_idx.y

    • ์ „์—ญ ์—ด(col): block_dim.x * block_idx.x + thread_idx.x

    • ์Šค๋ ˆ๋“œ ๊ทธ๋ฆฌ๋“œ๋ฅผ ํ…์„œ ์›์†Œ์— ๋งคํ•‘:

      3ร—3 ๋ธ”๋ก์œผ๋กœ ๊ตฌ์„ฑ๋œ 5ร—5 ํ…์„œ:
      
      Block (0,0)         Block (1,0)
      [(0,0) (0,1) (0,2)] [(0,3) (0,4)    *  ]
      [(1,0) (1,1) (1,2)] [(1,3) (1,4)    *  ]
      [(2,0) (2,1) (2,2)] [(2,3) (2,4)    *  ]
      
      Block (0,1)         Block (1,1)
      [(3,0) (3,1) (3,2)] [(3,3) (3,4)    *  ]
      [(4,0) (4,1) (4,2)] [(4,3) (4,4)    *  ]
      [  *     *     *  ] [  *     *      *  ]
      

      (* = ์Šค๋ ˆ๋“œ๋Š” ์กด์žฌํ•˜์ง€๋งŒ ํ…์„œ ๊ฒฝ๊ณ„ ๋ฐ–)

  2. LayoutTensor์˜ ์žฅ์ 

    • ์ž์—ฐ์Šค๋Ÿฌ์šด 2D ์ธ๋ฑ์‹ฑ: ์ˆ˜๋™ ์˜คํ”„์…‹ ๊ณ„์‚ฐ ๋Œ€์‹  tensor[row, col] ์‚ฌ์šฉ

    • ์ž๋™ ๋ฉ”๋ชจ๋ฆฌ ๋ ˆ์ด์•„์›ƒ ์ตœ์ ํ™”

    • ์ ‘๊ทผ ํŒจํ„ด ์˜ˆ์‹œ:

      ์›์‹œ ๋ฉ”๋ชจ๋ฆฌ:          LayoutTensor:
      row * size + col    tensor[row, col]
      (2,1) -> 11        (2,1) -> ๊ฐ™์€ ์›์†Œ
      
  3. ๊ฒฝ๊ณ„ ๊ฒ€์‚ฌ

    • ๊ฐ€๋“œ row < size and col < size๊ฐ€ ์ฒ˜๋ฆฌํ•˜๋Š” ์ƒํ™ฉ:
      • ๋ถ€๋ถ„ ๋ธ”๋ก์—์„œ ๋ฒ”์œ„๋ฅผ ๋ฒ—์–ด๋‚˜๋Š” ์Šค๋ ˆ๋“œ
      • ํ…์„œ ๊ฒฝ๊ณ„์˜ ์—ฃ์ง€ ์ผ€์ด์Šค
      • ๋ฉ”๋ชจ๋ฆฌ ๋ ˆ์ด์•„์›ƒ์€ LayoutTensor๊ฐ€ ์ž๋™์œผ๋กœ ์ฒ˜๋ฆฌ
      • 25๊ฐœ ์›์†Œ๋ฅผ 36๊ฐœ ์Šค๋ ˆ๋“œ๋กœ ์ฒ˜๋ฆฌ (3ร—3 ๋ธ”๋ก์˜ 2ร—2 ๊ทธ๋ฆฌ๋“œ)
  4. ๋ธ”๋ก ์กฐ์œจ

    • ๊ฐ 3ร—3 ๋ธ”๋ก์ด 5ร—5 ํ…์„œ์˜ ์ผ๋ถ€๋ถ„์„ ๋‹ด๋‹น
    • LayoutTensor๊ฐ€ ์ฒ˜๋ฆฌํ•˜๋Š” ๋ถ€๋ถ„:
      • ๋ฉ”๋ชจ๋ฆฌ ๋ ˆ์ด์•„์›ƒ ์ตœ์ ํ™”
      • ํšจ์œจ์ ์ธ ์ ‘๊ทผ ํŒจํ„ด
      • ๋ธ”๋ก ๊ฒฝ๊ณ„ ๊ฐ„ ์กฐ์œจ
      • ์บ์‹œ ์นœํ™”์  ๋ฐ์ดํ„ฐ ์ ‘๊ทผ

์ด ํŒจํ„ด์€ LayoutTensor๊ฐ€ ์ตœ์ ์˜ ๋ฉ”๋ชจ๋ฆฌ ์ ‘๊ทผ ํŒจํ„ด๊ณผ ์Šค๋ ˆ๋“œ ์กฐ์œจ์„ ์œ ์ง€ํ•˜๋ฉด์„œ๋„ 2D ๋ธ”๋ก ์ฒ˜๋ฆฌ๋ฅผ ์–ผ๋งˆ๋‚˜ ๊ฐ„์†Œํ™”ํ•˜๋Š”์ง€ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.