๊ฐœ์š”

๋ฒกํ„ฐ a์™€ b๋ฅผ ๋ธŒ๋กœ๋“œ์บ์ŠคํŠธ๋กœ ๋”ํ•ด 2D ํ–‰๋ ฌ output์— ์ €์žฅํ•˜๋Š” ์ปค๋„์„ ๊ตฌํ˜„ํ•ด ๋ณด์„ธ์š”.

์ฐธ๊ณ : ์Šค๋ ˆ๋“œ ์ˆ˜๊ฐ€ ํ–‰๋ ฌ์˜ ์œ„์น˜ ์ˆ˜๋ณด๋‹ค ๋งŽ์Šต๋‹ˆ๋‹ค.

ํ•ต์‹ฌ ๊ฐœ๋…

์ด ํผ์ฆ์—์„œ ๋ฐฐ์šธ ๋‚ด์šฉ:

  • 1D ๋ฒกํ„ฐ๋ฅผ ๊ฐ๊ฐ ๋‹ค๋ฅธ ์ฐจ์› ๋ฐฉํ–ฅ์œผ๋กœ ๋ธŒ๋กœ๋“œ์บ์ŠคํŠธํ•˜๊ธฐ
  • 2D ์Šค๋ ˆ๋“œ ์ธ๋ฑ์Šค๋กœ ๋ธŒ๋กœ๋“œ์บ์ŠคํŠธ ์—ฐ์‚ฐ ์ˆ˜ํ–‰ํ•˜๊ธฐ
  • ๋ธŒ๋กœ๋“œ์บ์ŠคํŠธ ํŒจํ„ด์—์„œ ๊ฒฝ๊ณ„ ์กฐ๊ฑด ์ฒ˜๋ฆฌํ•˜๊ธฐ

ํ•ต์‹ฌ์€ ๋‘ 1D ๋ฒกํ„ฐ์˜ ์›์†Œ๋“ค์„ ๋ธŒ๋กœ๋“œ์บ์ŠคํŠธ๋กœ 2D ์ถœ๋ ฅ ํ–‰๋ ฌ์— ๋งคํ•‘ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ดํ•ดํ•˜๊ณ , ์Šค๋ ˆ๋“œ ๊ฒฝ๊ณ„๋ฅผ ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ์ฒ˜๋ฆฌํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

  • ๋ธŒ๋กœ๋“œ์บ์ŠคํŠธ: a์˜ ๊ฐ ์›์†Œ๊ฐ€ b์˜ ๊ฐ ์›์†Œ์™€ ๊ฒฐํ•ฉ
  • ์Šค๋ ˆ๋“œ ๋งคํ•‘: \(2 \times 2\) ์ถœ๋ ฅ์— \((3 \times 3)\) ์Šค๋ ˆ๋“œ ๊ทธ๋ฆฌ๋“œ ์‚ฌ์šฉ
  • ๋ฒกํ„ฐ ์ ‘๊ทผ: a์™€ b๋Š” ์„œ๋กœ ๋‹ค๋ฅธ ์ ‘๊ทผ ํŒจํ„ด ์‚ฌ์šฉ
  • ๊ฒฝ๊ณ„ ๊ฒ€์‚ฌ: ํ–‰๋ ฌ ๋ฒ”์œ„๋ฅผ ๋ฒ—์–ด๋‚˜๋Š” ์Šค๋ ˆ๋“œ๋ฅผ ๊ฐ€๋“œ๋กœ ์ฒ˜๋ฆฌ

์™„์„ฑํ•  ์ฝ”๋“œ

comptime SIZE = 2
comptime BLOCKS_PER_GRID = 1
comptime THREADS_PER_BLOCK = (3, 3)
comptime dtype = DType.float32


fn broadcast_add(
    output: UnsafePointer[Scalar[dtype], MutAnyOrigin],
    a: UnsafePointer[Scalar[dtype], MutAnyOrigin],
    b: UnsafePointer[Scalar[dtype], MutAnyOrigin],
    size: UInt,
):
    row = thread_idx.y
    col = thread_idx.x
    # FILL ME IN (roughly 2 lines)


์ „์ฒด ์ฝ”๋“œ ๋ณด๊ธฐ: problems/p05/p05.mojo

ํŒ
  1. 2D ์ธ๋ฑ์Šค ๊ฐ€์ ธ์˜ค๊ธฐ: row = thread_idx.y, col = thread_idx.x
  2. ๊ฐ€๋“œ ์ถ”๊ฐ€: if row < size and col < size
  3. ๊ฐ€๋“œ ๋‚ด๋ถ€: a์™€ b ๊ฐ’์„ ์–ด๋–ป๊ฒŒ ๋ธŒ๋กœ๋“œ์บ์ŠคํŠธํ• ์ง€ ์ƒ๊ฐํ•ด ๋ณด์„ธ์š”

์ฝ”๋“œ ์‹คํ–‰

์†”๋ฃจ์…˜์„ ํ…Œ์ŠคํŠธํ•˜๋ ค๋ฉด ํ„ฐ๋ฏธ๋„์—์„œ ๋‹ค์Œ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•˜์„ธ์š”:

pixi run p05
pixi run -e amd p05
pixi run -e apple p05
uv run poe p05

ํผ์ฆ์„ ์•„์ง ํ’€์ง€ ์•Š์•˜๋‹ค๋ฉด ์ถœ๋ ฅ์ด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋‚˜ํƒ€๋‚ฉ๋‹ˆ๋‹ค:

out: HostBuffer([0.0, 0.0, 0.0, 0.0])
expected: HostBuffer([1.0, 2.0, 11.0, 12.0])

์†”๋ฃจ์…˜

fn broadcast_add(
    output: UnsafePointer[Scalar[dtype], MutAnyOrigin],
    a: UnsafePointer[Scalar[dtype], MutAnyOrigin],
    b: UnsafePointer[Scalar[dtype], MutAnyOrigin],
    size: UInt,
):
    row = thread_idx.y
    col = thread_idx.x
    if row < size and col < size:
        output[row * size + col] = a[col] + b[row]


LayoutTensor ์ถ”์ƒํ™” ์—†์ด GPU ๋ธŒ๋กœ๋“œ์บ์ŠคํŠธ์˜ ๊ธฐ๋ณธ ๊ฐœ๋…์„ ๋ณด์—ฌ์ฃผ๋Š” ์†”๋ฃจ์…˜์ž…๋‹ˆ๋‹ค:

  1. ์Šค๋ ˆ๋“œ์—์„œ ํ–‰๋ ฌ๋กœ ๋งคํ•‘

    • thread_idx.y๋กœ ํ–‰, thread_idx.x๋กœ ์—ด์— ์ ‘๊ทผ
    • 2D ์Šค๋ ˆ๋“œ ๊ทธ๋ฆฌ๋“œ๋ฅผ ์ถœ๋ ฅ ํ–‰๋ ฌ ์›์†Œ์— ์ง์ ‘ ๋งคํ•‘
    • 3ร—3 ๊ทธ๋ฆฌ๋“œ์˜ ์ดˆ๊ณผ ์Šค๋ ˆ๋“œ๋ฅผ 2ร—2 ์ถœ๋ ฅ์— ๋งž๊ฒŒ ์ฒ˜๋ฆฌ
  2. ๋ธŒ๋กœ๋“œ์บ์ŠคํŠธ ์ž‘๋™ ๋ฐฉ์‹

    • ๋ฒกํ„ฐ a๋Š” ์ˆ˜ํ‰ ๋ฐฉํ–ฅ์œผ๋กœ ๋ธŒ๋กœ๋“œ์บ์ŠคํŠธ: ๊ฐ ํ–‰์—์„œ ๋™์ผํ•œ a[col] ์‚ฌ์šฉ
    • ๋ฒกํ„ฐ b๋Š” ์ˆ˜์ง ๋ฐฉํ–ฅ์œผ๋กœ ๋ธŒ๋กœ๋“œ์บ์ŠคํŠธ: ๊ฐ ์—ด์—์„œ ๋™์ผํ•œ b[row] ์‚ฌ์šฉ
    • ๋‘ ๋ฒกํ„ฐ๋ฅผ ๋”ํ•ด ์ถœ๋ ฅ ์ƒ์„ฑ
    [ a0 a1 ]  +  [ b0 ]  =  [ a0+b0  a1+b0 ]
                  [ b1 ]     [ a0+b1  a1+b1 ]
    
  3. ๊ฒฝ๊ณ„ ๊ฒ€์‚ฌ

    • ๋‹จ์ผ ๊ฐ€๋“œ ์กฐ๊ฑด row < size and col < size๋กœ ๋‘ ์ฐจ์› ๋ชจ๋‘ ์ฒ˜๋ฆฌ
    • ์ž…๋ ฅ ๋ฒกํ„ฐ์™€ ์ถœ๋ ฅ ํ–‰๋ ฌ์˜ ๋ฒ”์œ„ ์ดˆ๊ณผ ์ ‘๊ทผ ๋ฐฉ์ง€
    • 3ร—3 ์Šค๋ ˆ๋“œ ๊ทธ๋ฆฌ๋“œ๊ฐ€ 2ร—2 ๋ฐ์ดํ„ฐ๋ณด๋‹ค ํฌ๋ฏ€๋กœ ๋ฐ˜๋“œ์‹œ ํ•„์š”

LayoutTensor ๋ฒ„์ „๊ณผ ๋น„๊ตํ•ด์„œ ๋™์ผํ•œ ๊ธฐ๋ณธ ๊ฐœ๋…์„ ์œ ์ง€ํ•˜๋ฉด์„œ ์ถ”์ƒํ™”๊ฐ€ ๋ธŒ๋กœ๋“œ์บ์ŠคํŠธ ์—ฐ์‚ฐ์„ ์–ผ๋งˆ๋‚˜ ๋‹จ์ˆœํ•˜๊ฒŒ ๋งŒ๋“œ๋Š”์ง€ ํ™•์ธํ•ด ๋ณด์„ธ์š”.