๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ ๋ฒ„์ „

๊ฐœ์š”

์ •๋ฐฉ ํ–‰๋ ฌ \(A\) ์™€ \(B\) ์˜ ํ–‰๋ ฌ ๊ณฑ์…ˆ์„ ๊ตฌํ˜„ํ•˜๊ณ  ๊ฒฐ๊ณผ๋ฅผ \(\text{output}\)์— ์ €์žฅํ•˜๋Š” ํผ์ฆ์ž…๋‹ˆ๋‹ค. ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋ฉ”๋ชจ๋ฆฌ ์ ‘๊ทผ ํŒจํ„ด์„ ์ตœ์ ํ™”ํ•ฉ๋‹ˆ๋‹ค. ์—ฐ์‚ฐ ์ „์— ํ–‰๋ ฌ ๋ธ”๋ก์„ ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ์— ๋ฏธ๋ฆฌ ๋กœ๋“œํ•˜๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค.

ํ•ต์‹ฌ ๊ฐœ๋…

์ด ํผ์ฆ์—์„œ ๋‹ค๋ฃจ๋Š” ๋‚ด์šฉ:

  • LayoutTensor๋ฅผ ์‚ฌ์šฉํ•œ ๋ธ”๋ก ๋กœ์ปฌ ๋ฉ”๋ชจ๋ฆฌ ๊ด€๋ฆฌ
  • ์Šค๋ ˆ๋“œ ๋™๊ธฐํ™” ํŒจํ„ด
  • ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ํ™œ์šฉํ•œ ๋ฉ”๋ชจ๋ฆฌ ์ ‘๊ทผ ์ตœ์ ํ™”
  • 2D ์ธ๋ฑ์‹ฑ์„ ์‚ฌ์šฉํ•œ ํ˜‘๋ ฅ์  ๋ฐ์ดํ„ฐ ๋กœ๋”ฉ
  • ํ–‰๋ ฌ ์—ฐ์‚ฐ์— LayoutTensor๋ฅผ ํšจ์œจ์ ์œผ๋กœ ํ™œ์šฉํ•˜๊ธฐ

ํ•ต์‹ฌ์€ LayoutTensor๋ฅผ ํ†ตํ•ด ๋น ๋ฅธ ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋น„์šฉ์ด ํฐ ์ „์—ญ ๋ฉ”๋ชจ๋ฆฌ ์ ‘๊ทผ์„ ์ตœ์†Œํ™”ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๊ตฌ์„ฑ

  • ํ–‰๋ ฌ ํฌ๊ธฐ: \(\text{SIZE} \times \text{SIZE} = 2 \times 2\)
  • ๋ธ”๋ก๋‹น ์Šค๋ ˆ๋“œ ์ˆ˜: \(\text{TPB} \times \text{TPB} = 3 \times 3\)
  • ๊ทธ๋ฆฌ๋“œ ์ฐจ์›: \(1 \times 1\)

๋ ˆ์ด์•„์›ƒ ๊ตฌ์„ฑ:

  • ์ž…๋ ฅ A: Layout.row_major(SIZE, SIZE)
  • ์ž…๋ ฅ B: Layout.row_major(SIZE, SIZE)
  • ์ถœ๋ ฅ: Layout.row_major(SIZE, SIZE)
  • ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ: TPB ร— TPB ํฌ๊ธฐ์˜ LayoutTensor 2๊ฐœ

๋ฉ”๋ชจ๋ฆฌ ๊ตฌ์„ฑ:

Global Memory (LayoutTensor):          Shared Memory (LayoutTensor):
A[i,j]: Direct access                  a_shared[local_row, local_col]
B[i,j]: Direct access                  b_shared[local_row, local_col]

์™„์„ฑํ•  ์ฝ”๋“œ

fn single_block_matmul[
    layout: Layout, size: UInt
](
    output: LayoutTensor[dtype, layout, MutAnyOrigin],
    a: LayoutTensor[dtype, layout, ImmutAnyOrigin],
    b: LayoutTensor[dtype, layout, ImmutAnyOrigin],
):
    row = block_dim.y * block_idx.y + thread_idx.y
    col = block_dim.x * block_idx.x + thread_idx.x
    local_row = thread_idx.y
    local_col = thread_idx.x
    # FILL ME IN (roughly 12 lines)


์ „์ฒด ํŒŒ์ผ ๋ณด๊ธฐ: problems/p16/p16.mojo

ํŒ
  1. ์ „์—ญ ์ธ๋ฑ์Šค์™€ ๋กœ์ปฌ ์ธ๋ฑ์Šค๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ–‰๋ ฌ์„ ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ์— ๋กœ๋“œ
  2. ๋กœ๋“œ ํ›„ barrier() ํ˜ธ์ถœ
  3. ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ ์ธ๋ฑ์Šค๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‚ด์  ๊ณ„์‚ฐ
  4. ๋ชจ๋“  ์—ฐ์‚ฐ์—์„œ ๋ฐฐ์—ด ๊ฒฝ๊ณ„ ๊ฒ€์‚ฌ

์ฝ”๋“œ ์‹คํ–‰

์†”๋ฃจ์…˜์„ ํ…Œ์ŠคํŠธํ•˜๋ ค๋ฉด ํ„ฐ๋ฏธ๋„์—์„œ ๋‹ค์Œ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•˜์„ธ์š”:

pixi run p16 --single-block
pixi run -e amd p16 --single-block
pixi run -e apple p16 --single-block
uv run poe p16 --single-block

ํผ์ฆ์„ ์•„์ง ํ’€์ง€ ์•Š์•˜๋‹ค๋ฉด ์ถœ๋ ฅ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

out: HostBuffer([0.0, 0.0, 0.0, 0.0])
expected: HostBuffer([4.0, 6.0, 12.0, 22.0])

์†”๋ฃจ์…˜

fn single_block_matmul[
    layout: Layout, size: UInt
](
    output: LayoutTensor[dtype, layout, MutAnyOrigin],
    a: LayoutTensor[dtype, layout, ImmutAnyOrigin],
    b: LayoutTensor[dtype, layout, ImmutAnyOrigin],
):
    row = block_dim.y * block_idx.y + thread_idx.y
    col = block_dim.x * block_idx.x + thread_idx.x
    local_row = thread_idx.y
    local_col = thread_idx.x

    a_shared = LayoutTensor[
        dtype,
        Layout.row_major(TPB, TPB),
        MutAnyOrigin,
        address_space = AddressSpace.SHARED,
    ].stack_allocation()
    b_shared = LayoutTensor[
        dtype,
        Layout.row_major(TPB, TPB),
        MutAnyOrigin,
        address_space = AddressSpace.SHARED,
    ].stack_allocation()

    if row < size and col < size:
        a_shared[local_row, local_col] = a[row, col]
        b_shared[local_row, local_col] = b[row, col]

    barrier()

    if row < size and col < size:
        var acc: output.element_type = 0

        @parameter
        for k in range(size):
            acc += a_shared[local_row, k] * b_shared[k, local_col]

        output[row, col] = acc


LayoutTensor๋ฅผ ํ™œ์šฉํ•œ ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ ๊ตฌํ˜„์€ ํšจ์œจ์ ์ธ ๋ฉ”๋ชจ๋ฆฌ ์ ‘๊ทผ ํŒจํ„ด์„ ํ†ตํ•ด ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค:

๋ฉ”๋ชจ๋ฆฌ ๊ตฌ์„ฑ

Input Tensors (2ร—2):                Shared Memory (3ร—3):
Matrix A:                           a_shared:
 [a[0,0] a[0,1]]                     [s[0,0] s[0,1] s[0,2]]
 [a[1,0] a[1,1]]                     [s[1,0] s[1,1] s[1,2]]
                                     [s[2,0] s[2,1] s[2,2]]
Matrix B:                           b_shared: (๋น„์Šทํ•œ ๋ ˆ์ด์•„์›ƒ)
 [b[0,0] b[0,1]]                     [t[0,0] t[0,1] t[0,2]]
 [b[1,0] b[1,1]]                     [t[1,0] t[1,1] t[1,2]]
                                     [t[2,0] t[2,1] t[2,2]]

๊ตฌํ˜„ ๋‹จ๊ณ„

  1. ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ ์„ค์ •:

    # address_space๋ฅผ ์ง€์ •ํ•œ LayoutTensor๋กœ 2D ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ ํ…์„œ ์ƒ์„ฑ
    a_shared = LayoutTensor[dtype, Layout.row_major(TPB, TPB), MutAnyOrigin, address_space = AddressSpace.SHARED].stack_allocation()
    b_shared = LayoutTensor[dtype, Layout.row_major(TPB, TPB), MutAnyOrigin, address_space = AddressSpace.SHARED].stack_allocation()
    
  2. ์Šค๋ ˆ๋“œ ์ธ๋ฑ์‹ฑ:

    # ํ–‰๋ ฌ ์ ‘๊ทผ์„ ์œ„ํ•œ ์ „์—ญ ์ธ๋ฑ์Šค
    row = block_dim.y * block_idx.y + thread_idx.y
    col = block_dim.x * block_idx.x + thread_idx.x
    
    # ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ์šฉ ๋กœ์ปฌ ์ธ๋ฑ์Šค
    local_row = thread_idx.y
    local_col = thread_idx.x
    
  3. ๋ฐ์ดํ„ฐ ๋กœ๋”ฉ:

    # LayoutTensor ์ธ๋ฑ์‹ฑ์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ์— ๋กœ๋“œ
    if row < size and col < size:
        a_shared[local_row, local_col] = a[row, col]
        b_shared[local_row, local_col] = b[row, col]
    
  4. ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•œ ์—ฐ์‚ฐ:

    # ๊ฐ€๋“œ๋กœ ์œ ํšจํ•œ ํ–‰๋ ฌ ์›์†Œ๋งŒ ๊ณ„์‚ฐ
    if row < size and col < size:
        # ์ถœ๋ ฅ ํ…์„œ์˜ ํƒ€์ž…์œผ๋กœ ๋ˆ„์  ๋ณ€์ˆ˜ ์ดˆ๊ธฐํ™”
        var acc: output.element_type = 0
    
        # ์ปดํŒŒ์ผ ํƒ€์ž„์— ์ „๊ฐœ๋˜๋Š” ํ–‰๋ ฌ ๊ณฑ์…ˆ ๋ฃจํ”„
        @parameter
        for k in range(size):
            acc += a_shared[local_row, k] * b_shared[k, local_col]
    
        # ํ–‰๋ ฌ ๊ฒฝ๊ณ„ ๋‚ด์˜ ์Šค๋ ˆ๋“œ๋งŒ ๊ฒฐ๊ณผ ๊ธฐ๋ก
        output[row, col] = acc
    

    ์ฃผ์š” ํฌ์ธํŠธ:

    • ๊ฒฝ๊ณ„ ๊ฒ€์‚ฌ: if row < size and col < size

      • ๋ฒ”์œ„ ๋ฐ– ์—ฐ์‚ฐ ๋ฐฉ์ง€
      • ์œ ํšจํ•œ ์Šค๋ ˆ๋“œ๋งŒ ์ž‘์—… ์ˆ˜ํ–‰
      • TPB (3ร—3) > SIZE (2ร—2)์ด๋ฏ€๋กœ ํ•„์ˆ˜
    • ๋ˆ„์  ๋ณ€์ˆ˜ ํƒ€์ž…: var acc: output.element_type

      • ์ถœ๋ ฅ ํ…์„œ์˜ ์›์†Œ ํƒ€์ž…์œผ๋กœ ํƒ€์ž… ์•ˆ์ „์„ฑ ํ™•๋ณด
      • ์ผ๊ด€๋œ ์ˆ˜์น˜ ์ •๋ฐ€๋„ ๋ณด์žฅ
      • ๋ˆ„์  ์ „์— 0์œผ๋กœ ์ดˆ๊ธฐํ™”
    • ๋ฃจํ”„ ์ตœ์ ํ™”: @parameter for k in range(size)

      • ์ปดํŒŒ์ผ ํƒ€์ž„์— ๋ฃจํ”„ ์ „๊ฐœ
      • ๋” ๋‚˜์€ ๋ช…๋ น์–ด ์Šค์ผ€์ค„๋ง ๊ฐ€๋Šฅ
      • ํฌ๊ธฐ๊ฐ€ ์ž‘๊ณ  ๋ฏธ๋ฆฌ ์•Œ๋ ค์ง„ ํ–‰๋ ฌ์— ํšจ๊ณผ์ 
    • ๊ฒฐ๊ณผ ๊ธฐ๋ก: output[row, col] = acc

      • ๋™์ผํ•œ ๊ฐ€๋“œ ์กฐ๊ฑด์œผ๋กœ ๋ณดํ˜ธ
      • ์œ ํšจํ•œ ์Šค๋ ˆ๋“œ๋งŒ ๊ฒฐ๊ณผ ๊ธฐ๋ก
      • ํ–‰๋ ฌ ๊ฒฝ๊ณ„ ์•ˆ์ „์„ฑ ์œ ์ง€

์Šค๋ ˆ๋“œ ์•ˆ์ „์„ฑ๊ณผ ๋™๊ธฐํ™”

  1. ๊ฐ€๋“œ ์กฐ๊ฑด:

    • ์ž…๋ ฅ ๋กœ๋”ฉ: if row < size and col < size
    • ์—ฐ์‚ฐ: ๋™์ผํ•œ ๊ฐ€๋“œ๋กœ ์Šค๋ ˆ๋“œ ์•ˆ์ „์„ฑ ๋ณด์žฅ
    • ์ถœ๋ ฅ ๊ธฐ๋ก: ๊ฐ™์€ ์กฐ๊ฑด์œผ๋กœ ๋ณดํ˜ธ
    • ์ž˜๋ชป๋œ ๋ฉ”๋ชจ๋ฆฌ ์ ‘๊ทผ๊ณผ ๊ฒฝ์Ÿ ์ƒํƒœ ๋ฐฉ์ง€
  2. ๋ฉ”๋ชจ๋ฆฌ ์ ‘๊ทผ ์•ˆ์ „์„ฑ:

    • ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ: TPB ๋ฒ”์œ„ ๋‚ด์—์„œ๋งŒ ์ ‘๊ทผ
    • ์ „์—ญ ๋ฉ”๋ชจ๋ฆฌ: ํฌ๊ธฐ ๊ฒ€์‚ฌ๋กœ ๋ณดํ˜ธ
    • ์ถœ๋ ฅ: ๊ฐ€๋“œ๋œ ์“ฐ๊ธฐ๋กœ ๋ฐ์ดํ„ฐ ์†์ƒ ๋ฐฉ์ง€

์ฃผ์š” ์–ธ์–ด ๊ธฐ๋Šฅ

  1. LayoutTensor์˜ ์žฅ์ :

    • ์ง์ ‘ 2D ์ธ๋ฑ์‹ฑ์œผ๋กœ ์ฝ”๋“œ ๋‹จ์ˆœํ™”
    • element_type์„ ํ†ตํ•œ ํƒ€์ž… ์•ˆ์ „์„ฑ
    • ํšจ์œจ์ ์ธ ๋ฉ”๋ชจ๋ฆฌ ๋ ˆ์ด์•„์›ƒ ์ฒ˜๋ฆฌ
  2. ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ ํ• ๋‹น:

    • address_space๋ฅผ ์ง€์ •ํ•œ LayoutTensor๋กœ ๊ตฌ์กฐํ™”๋œ ํ• ๋‹น
    • ์ž…๋ ฅ ํ…์„œ์™€ ๋™์ผํ•œ ํ–‰ ์šฐ์„  ๋ ˆ์ด์•„์›ƒ
    • ํšจ์œจ์  ์ ‘๊ทผ์„ ์œ„ํ•œ ์ ์ ˆํ•œ ๋ฉ”๋ชจ๋ฆฌ ์ •๋ ฌ
  3. ๋™๊ธฐํ™”:

    • barrier()๋กœ ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ ์ผ๊ด€์„ฑ ๋ณด์žฅ
    • ๋กœ๋“œ์™€ ์—ฐ์‚ฐ ๊ฐ„ ์ ์ ˆํ•œ ๋™๊ธฐํ™”
    • ๋ธ”๋ก ๋‚ด ์Šค๋ ˆ๋“œ ๊ฐ„ ํ˜‘๋ ฅ

์„ฑ๋Šฅ ์ตœ์ ํ™”

  1. ๋ฉ”๋ชจ๋ฆฌ ์ ‘๊ทผ ํšจ์œจ:

    • ์›์†Œ๋‹น ์ „์—ญ ๋ฉ”๋ชจ๋ฆฌ ๋กœ๋“œ 1ํšŒ
    • ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ํ†ตํ•œ ๋‹ค์ค‘ ์žฌ์‚ฌ์šฉ
    • ๋ณ‘ํ•ฉ๋œ(coalesced) ๋ฉ”๋ชจ๋ฆฌ ์ ‘๊ทผ ํŒจํ„ด
  2. ์Šค๋ ˆ๋“œ ํ˜‘๋ ฅ:

    • ํ˜‘๋ ฅ์  ๋ฐ์ดํ„ฐ ๋กœ๋”ฉ
    • ๊ณต์œ  ๋ฐ์ดํ„ฐ ์žฌ์‚ฌ์šฉ
    • ํšจ์œจ์ ์ธ ์Šค๋ ˆ๋“œ ๋™๊ธฐํ™”
  3. ์—ฐ์‚ฐ ์ด์ :

    • ์ „์—ญ ๋ฉ”๋ชจ๋ฆฌ ํŠธ๋ž˜ํ”ฝ ๊ฐ์†Œ
    • ์บ์‹œ ํ™œ์šฉ๋„ ํ–ฅ์ƒ
    • ๋ช…๋ น์–ด ์ฒ˜๋ฆฌ๋Ÿ‰ ๊ฐœ์„ 

์ด ๊ตฌํ˜„์€ ๋‹ค์Œ์„ ํ†ตํ•ด ๊ธฐ๋ณธ ๋ฒ„์ „ ๋Œ€๋น„ ์„ฑ๋Šฅ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค:

  • ์ „์—ญ ๋ฉ”๋ชจ๋ฆฌ ์ ‘๊ทผ ํšŸ์ˆ˜ ๊ฐ์†Œ
  • ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ํ†ตํ•œ ๋ฐ์ดํ„ฐ ์žฌ์‚ฌ์šฉ
  • LayoutTensor์˜ ํšจ์œจ์ ์ธ 2D ์ธ๋ฑ์‹ฑ ํ™œ์šฉ
  • ์ ์ ˆํ•œ ์Šค๋ ˆ๋“œ ๋™๊ธฐํ™” ์œ ์ง€