๐Ÿ ๊ฒฝ์Ÿ ์ƒํƒœ ๋””๋ฒ„๊น…

๊ฐœ์š”

NVIDIA compute-sanitizer๋ฅผ ์‚ฌ์šฉํ•ด ์ž˜๋ชป๋œ ๊ฒฐ๊ณผ๋ฅผ ์ผ์œผํ‚ค๋Š” ๊ฒฝ์Ÿ ์ƒํƒœ๋ฅผ ์‹๋ณ„ํ•˜๋ฉด์„œ ์‹คํŒจํ•˜๋Š” GPU ํ”„๋กœ๊ทธ๋žจ์„ ๋””๋ฒ„๊น…ํ•ฉ๋‹ˆ๋‹ค. ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ ์—ฐ์‚ฐ์—์„œ ๋™์‹œ์„ฑ ๋ฒ„๊ทธ๋ฅผ ์ฐพ๋Š” racecheck ๋„๊ตฌ ์‚ฌ์šฉ๋ฒ•์„ ๋ฐฐ์›๋‹ˆ๋‹ค.

๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ๋กœ ์—ฌ๋Ÿฌ ์Šค๋ ˆ๋“œ์˜ ๊ฐ’์„ ๋ˆ„์ ํ•ด์•ผ ํ•˜๋Š” GPU ์ปค๋„์ด ์žˆ์Šต๋‹ˆ๋‹ค. ํ…Œ์ŠคํŠธ๋Š” ์‹คํŒจํ•˜๋Š”๋ฐ, ๋กœ์ง์€ ์˜ฌ๋ฐ”๋ฅธ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. ๋‹น์‹ ์˜ ๊ณผ์ œ๋Š” ์‹คํŒจ๋ฅผ ์ผ์œผํ‚ค๋Š” ๊ฒฝ์Ÿ ์ƒํƒœ๋ฅผ ์ฐพ์•„ ์ˆ˜์ •ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๊ตฌ์„ฑ

comptime SIZE = 2
comptime BLOCKS_PER_GRID = 1
comptime THREADS_PER_BLOCK = (3, 3)  # 9๊ฐœ ์Šค๋ ˆ๋“œ ์ค‘ 4๊ฐœ๋งŒ ํ™œ์„ฑํ™”
comptime dtype = DType.float32

์‹คํŒจํ•˜๋Š” ์ปค๋„


comptime SIZE = 2
comptime BLOCKS_PER_GRID = 1
comptime THREADS_PER_BLOCK = (3, 3)
comptime dtype = DType.float32
comptime layout = Layout.row_major(SIZE, SIZE)


fn shared_memory_race(
    output: LayoutTensor[dtype, layout, MutAnyOrigin],
    a: LayoutTensor[dtype, layout, ImmutAnyOrigin],
    size: UInt,
):
    row = thread_idx.y
    col = thread_idx.x

    shared_sum = LayoutTensor[
        dtype,
        Layout.row_major(1),
        MutAnyOrigin,
        address_space = AddressSpace.SHARED,
    ].stack_allocation()

    if row < size and col < size:
        shared_sum[0] += a[row, col]

    barrier()

    if row < size and col < size:
        output[row, col] = shared_sum[0]


์ „์ฒด ํŒŒ์ผ ๋ณด๊ธฐ: problems/p10/p10.mojo

์ฝ”๋“œ ์‹คํ–‰

pixi run p10 --race-condition

์ถœ๋ ฅ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค

out shape: 2 x 2
Running race condition example...
out: HostBuffer([0.0, 0.0, 0.0, 0.0])
expected: HostBuffer([6.0, 6.0, 6.0, 6.0])
stack trace was not collected. Enable stack trace collection with environment variable `MOJO_ENABLE_STACK_TRACE_ON_ERROR`
Unhandled exception caught during execution: At /home/ubuntu/workspace/mojo-gpu-puzzles/problems/p10/p10.mojo:122:33: AssertionError: `left == right` comparison failed:
   left: 0.0
  right: 6.0

compute-sanitizer๊ฐ€ GPU ์ฝ”๋“œ์˜ ๋ฌธ์ œ๋ฅผ ์–ด๋–ป๊ฒŒ ์ฐพ์•„๋‚ด๋Š”์ง€ ์‚ดํŽด๋ด…์‹œ๋‹ค.

compute-sanitizer๋กœ ๋””๋ฒ„๊น…ํ•˜๊ธฐ

1๋‹จ๊ณ„: racecheck๋กœ ๊ฒฝ์Ÿ ์ƒํƒœ ์‹๋ณ„

compute-sanitizer์™€ racecheck ๋„๊ตฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฒฝ์Ÿ ์ƒํƒœ๋ฅผ ์‹๋ณ„ํ•ฉ๋‹ˆ๋‹ค:

pixi run compute-sanitizer --tool racecheck mojo problems/p10/p10.mojo --race-condition

์ถœ๋ ฅ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค

========= COMPUTE-SANITIZER
out shape: 2 x 2
Running race condition example...
========= Error: Race reported between Write access at p10_shared_memory_race_...+0x140
=========     and Read access at p10_shared_memory_race_...+0xe0 [4 hazards]
=========     and Write access at p10_shared_memory_race_...+0x140 [5 hazards]
=========
out: HostBuffer([0.0, 0.0, 0.0, 0.0])
expected: HostBuffer([6.0, 6.0, 6.0, 6.0])
AssertionError: `left == right` comparison failed:
  left: 0.0
  right: 6.0
========= RACECHECK SUMMARY: 1 hazard displayed (1 error, 0 warnings)

๋ถ„์„: ํ”„๋กœ๊ทธ๋žจ์— 1๊ฐœ์˜ ๊ฒฝ์Ÿ ์ƒํƒœ์™€ 9๊ฐœ์˜ ๊ฐœ๋ณ„ ์œ„ํ—˜ ์š”์†Œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค:

  • 4๊ฐœ์˜ read-after-write ์œ„ํ—˜ ์š”์†Œ (๋‹ค๋ฅธ ์Šค๋ ˆ๋“œ๊ฐ€ ์“ฐ๋Š” ๋™์•ˆ ์ฝ๊ธฐ)
  • 5๊ฐœ์˜ write-after-write ์œ„ํ—˜ ์š”์†Œ (์—ฌ๋Ÿฌ ์Šค๋ ˆ๋“œ๊ฐ€ ๋™์‹œ์— ์“ฐ๊ธฐ)

2๋‹จ๊ณ„: synccheck์™€ ๋น„๊ต

๋™๊ธฐํ™” ๋ฌธ์ œ๊ฐ€ ์•„๋‹Œ ๊ฒฝ์Ÿ ์ƒํƒœ์ธ์ง€ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค:

pixi run compute-sanitizer --tool synccheck mojo problems/p10/p10.mojo --race-condition

์ถœ๋ ฅ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค

========= COMPUTE-SANITIZER
out shape: 2 x 2
Running race condition example...
out: HostBuffer([0.0, 0.0, 0.0, 0.0])
expected: HostBuffer([6.0, 6.0, 6.0, 6.0])
AssertionError: `left == right` comparison failed:
  left: 0.0
  right: 6.0
========= ERROR SUMMARY: 0 errors

ํ•ต์‹ฌ ํ†ต์ฐฐ: synccheck๊ฐ€ 0๊ฐœ์˜ ์˜ค๋ฅ˜๋ฅผ ์ฐพ์•˜์Šต๋‹ˆ๋‹ค - ๊ต์ฐฉ ์ƒํƒœ ๊ฐ™์€ ๋™๊ธฐํ™” ๋ฌธ์ œ๋Š” ์—†์Šต๋‹ˆ๋‹ค. ๋ฌธ์ œ๋Š” ๋™๊ธฐํ™” ๋ฒ„๊ทธ๊ฐ€ ์•„๋‹Œ ๊ฒฝ์Ÿ ์ƒํƒœ์ž…๋‹ˆ๋‹ค.

๊ต์ฐฉ ์ƒํƒœ vs ๊ฒฝ์Ÿ ์ƒํƒœ: ์ฐจ์ด์  ์ดํ•ดํ•˜๊ธฐ

์ธก๋ฉด๊ต์ฐฉ ์ƒํƒœ๊ฒฝ์Ÿ ์ƒํƒœ
์ฆ์ƒํ”„๋กœ๊ทธ๋žจ์ด ์˜์›ํžˆ ๋ฉˆ์ถคํ”„๋กœ๊ทธ๋žจ์ด ์ž˜๋ชป๋œ ๊ฒฐ๊ณผ ์ƒ์„ฑ
์‹คํ–‰์™„๋ฃŒ๋˜์ง€ ์•Š์Œ์„ฑ๊ณต์ ์œผ๋กœ ์™„๋ฃŒ๋จ
ํƒ€์ด๋ฐ๊ฒฐ์ •์ ์œผ๋กœ ๋ฉˆ์ถค๋น„๊ฒฐ์ •์  ๊ฒฐ๊ณผ
๊ทผ๋ณธ ์›์ธ๋™๊ธฐํ™” ๋กœ์ง ์˜ค๋ฅ˜๋™๊ธฐํ™”๋˜์ง€ ์•Š์€ ๋ฐ์ดํ„ฐ ์ ‘๊ทผ
ํƒ์ง€ ๋„๊ตฌsynccheckracecheck
์˜ˆ์‹œPuzzle 09: ์„ธ ๋ฒˆ์งธ ์‚ฌ๋ก€ ๋ฐฐ๋ฆฌ์–ด ๊ต์ฐฉ ์ƒํƒœ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ += ์—ฐ์‚ฐ

์šฐ๋ฆฌ ์‚ฌ๋ก€์—์„œ:

  • ํ”„๋กœ๊ทธ๋žจ ์™„๋ฃŒ๋จ โ†’ ๊ต์ฐฉ ์ƒํƒœ ์—†์Œ (์Šค๋ ˆ๋“œ๊ฐ€ ๋ฉˆ์ถ”์ง€ ์•Š์Œ)
  • ์ž˜๋ชป๋œ ๊ฒฐ๊ณผ โ†’ ๊ฒฝ์Ÿ ์ƒํƒœ (์Šค๋ ˆ๋“œ๋“ค์ด ์„œ๋กœ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์†์ƒ)
  • ๋„๊ตฌ ํ™•์ธ โ†’ synccheck๋Š” 0๊ฐœ ์˜ค๋ฅ˜, racecheck๋Š” 9๊ฐœ ์œ„ํ—˜ ์š”์†Œ ๋ณด๊ณ 

๋””๋ฒ„๊น…์—์„œ ์ด ๊ตฌ๋ถ„์ด ์ค‘์š”ํ•œ ์ด์œ :

  • ๊ต์ฐฉ ์ƒํƒœ ๋””๋ฒ„๊น…: ๋ฐฐ๋ฆฌ์–ด ๋ฐฐ์น˜, ์กฐ๊ฑด๋ถ€ ๋™๊ธฐํ™”, ์Šค๋ ˆ๋“œ ์กฐ์œจ์— ์ง‘์ค‘
  • ๊ฒฝ์Ÿ ์ƒํƒœ ๋””๋ฒ„๊น…: ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ ์ ‘๊ทผ ํŒจํ„ด, ์›์ž์  ์—ฐ์‚ฐ (์—ญ์ฃผ: ์ค‘๊ฐ„ ์ƒํƒœ ์—†์ด ์™„์ „ํžˆ ์‹คํ–‰๋˜๊ฑฐ๋‚˜ ์ „ํ˜€ ์‹คํ–‰๋˜์ง€ ์•Š๋Š” ์—ฐ์‚ฐ), ๋ฐ์ดํ„ฐ ์˜์กด์„ฑ์— ์ง‘์ค‘

๋„์ „ ๊ณผ์ œ

์ด ๋„๊ตฌ๋“ค์„ ํ™œ์šฉํ•˜์—ฌ ์‹คํŒจํ•˜๋Š” ์ปค๋„์„ ์ˆ˜์ •ํ•˜์„ธ์š”.

ํŒ

์œ„ํ—˜ ์š”์†Œ ๋ถ„์„

shared_sum[0] += a[row, col] ์—ฐ์‚ฐ์ด ์œ„ํ—˜ํ•œ ์ด์œ ๋Š” ์‹ค์ œ๋กœ ์„ธ ๊ฐœ์˜ ๋ณ„๋„ ๋ฉ”๋ชจ๋ฆฌ ์—ฐ์‚ฐ์ด๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค:

  1. shared_sum[0] ์ฝ๊ธฐ
  2. ์ฝ์€ ๊ฐ’์— a[row, col] ๋”ํ•˜๊ธฐ
  3. ๊ฒฐ๊ณผ๋ฅผ shared_sum[0]์— ๋‹ค์‹œ ์“ฐ๊ธฐ

4๊ฐœ์˜ ํ™œ์„ฑ ์Šค๋ ˆ๋“œ(์œ„์น˜ (0,0), (0,1), (1,0), (1,1))์—์„œ ์ด ์—ฐ์‚ฐ๋“ค์ด ๊ฒน์น  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

  • ์Šค๋ ˆ๋“œ ํƒ€์ด๋ฐ ์ค‘์ฒฉ โ†’ ์—ฌ๋Ÿฌ ์Šค๋ ˆ๋“œ๊ฐ€ ๊ฐ™์€ ์ดˆ๊ธฐ๊ฐ’(0.0)์„ ์ฝ์Œ
  • ์—…๋ฐ์ดํŠธ ์†์‹ค โ†’ ๊ฐ ์Šค๋ ˆ๋“œ๊ฐ€ 0.0 + ์ž์‹ ์˜_๊ฐ’์„ ์จ์„œ ๋‹ค๋ฅธ ์Šค๋ ˆ๋“œ์˜ ์ž‘์—…์„ ๋ฎ์–ด์”€
  • ๋น„์›์ž์  ์—ฐ์‚ฐ โ†’ += ๋ณตํ•ฉ ๋Œ€์ž…์€ GPU ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ์—์„œ ์›์ž์ ์ด์ง€ ์•Š์Œ (์—ญ์ฃผ: ์‹คํ–‰ ๋„์ค‘ ๋‹ค๋ฅธ ์Šค๋ ˆ๋“œ๊ฐ€ ๋ผ์–ด๋“ค ์ˆ˜ ์žˆ์–ด ์ค‘๊ฐ„ ์ƒํƒœ๊ฐ€ ๋…ธ์ถœ๋จ)

์ •ํ™•ํžˆ 9๊ฐœ์˜ ์œ„ํ—˜ ์š”์†Œ๊ฐ€ ๋‚˜์˜ค๋Š” ์ด์œ :

  • ๊ฐ ์Šค๋ ˆ๋“œ๊ฐ€ read-modify-write๋ฅผ ์‹œ๋„
  • 4๊ฐœ ์Šค๋ ˆ๋“œ ร— ์Šค๋ ˆ๋“œ๋‹น 2-3๊ฐœ ์œ„ํ—˜ ์š”์†Œ = ์ด 9๊ฐœ ์œ„ํ—˜ ์š”์†Œ
  • compute-sanitizer๊ฐ€ ๋ชจ๋“  ์ถฉ๋Œํ•˜๋Š” ๋ฉ”๋ชจ๋ฆฌ ์ ‘๊ทผ ์Œ์„ ์ถ”์ 

๊ฒฝ์Ÿ ์ƒํƒœ ๋””๋ฒ„๊น… ํŒ

  1. ๋ฐ์ดํ„ฐ ๊ฒฝ์Ÿ์—๋Š” racecheck ์‚ฌ์šฉ: ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ ์œ„ํ—˜ ์š”์†Œ์™€ ๋ฐ์ดํ„ฐ ์†์ƒ ํƒ์ง€
  2. ๊ต์ฐฉ ์ƒํƒœ์—๋Š” synccheck ์‚ฌ์šฉ: ๋™๊ธฐํ™” ๋ฒ„๊ทธ(๋ฐฐ๋ฆฌ์–ด ๋ฌธ์ œ, ๊ต์ฐฉ ์ƒํƒœ) ํƒ์ง€
  3. ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ ์ ‘๊ทผ์— ์ง‘์ค‘: ๊ณต์œ  ๋ณ€์ˆ˜์— ๋Œ€ํ•œ ๋™๊ธฐํ™”๋˜์ง€ ์•Š์€ +=, = ์—ฐ์‚ฐ ์ฐพ๊ธฐ
  4. ํŒจํ„ด ์‹๋ณ„: read-modify-write ์—ฐ์‚ฐ์ด ํ”ํ•œ ๊ฒฝ์Ÿ ์ƒํƒœ ์›์ธ
  5. ๋ฐฐ๋ฆฌ์–ด ๋ฐฐ์น˜ ํ™•์ธ: ๋ฐฐ๋ฆฌ์–ด๋Š” ์ถฉ๋Œ ์—ฐ์‚ฐ ์ด์ „์— ๋ฐฐ์น˜ํ•ด์•ผ ํ•จ, ์ดํ›„๊ฐ€ ์•„๋‹˜

๋””๋ฒ„๊น…์—์„œ ์ด ๊ตฌ๋ถ„์ด ์ค‘์š”ํ•œ ์ด์œ :

  • ๊ต์ฐฉ ์ƒํƒœ ๋””๋ฒ„๊น…: ๋ฐฐ๋ฆฌ์–ด ๋ฐฐ์น˜, ์กฐ๊ฑด๋ถ€ ๋™๊ธฐํ™”, ์Šค๋ ˆ๋“œ ์กฐ์œจ์— ์ง‘์ค‘
  • ๊ฒฝ์Ÿ ์ƒํƒœ ๋””๋ฒ„๊น…: ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ ์ ‘๊ทผ ํŒจํ„ด, ์›์ž์  ์—ฐ์‚ฐ, ๋ฐ์ดํ„ฐ ์˜์กด์„ฑ์— ์ง‘์ค‘

ํ”ผํ•ด์•ผ ํ•  ํ”ํ•œ ๊ฒฝ์Ÿ ์ƒํƒœ ํŒจํ„ด:

  • ์—ฌ๋Ÿฌ ์Šค๋ ˆ๋“œ๊ฐ€ ๊ฐ™์€ ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ ์œ„์น˜์— ์“ฐ๊ธฐ
  • ๋™๊ธฐํ™”๋˜์ง€ ์•Š์€ read-modify-write ์—ฐ์‚ฐ (+=, ++ ๋“ฑ)
  • ๊ฒฝ์Ÿ ์ƒํƒœ ์ด์ „์ด ์•„๋‹Œ ์ดํ›„์— ๋ฐฐ๋ฆฌ์–ด ๋ฐฐ์น˜

์†”๋ฃจ์…˜


comptime SIZE = 2
comptime BLOCKS_PER_GRID = 1
comptime THREADS_PER_BLOCK = (3, 3)
comptime dtype = DType.float32
comptime layout = Layout.row_major(SIZE, SIZE)


fn shared_memory_race(
    output: LayoutTensor[dtype, layout, MutAnyOrigin],
    a: LayoutTensor[dtype, layout, ImmutAnyOrigin],
    size: UInt,
):
    """Fixed: sequential access with barriers eliminates race conditions."""
    row = thread_idx.y
    col = thread_idx.x

    shared_sum = LayoutTensor[
        dtype,
        Layout.row_major(1),
        MutAnyOrigin,
        address_space = AddressSpace.SHARED,
    ].stack_allocation()

    # Only thread 0 does all the accumulation work to prevent races
    if row == 0 and col == 0:
        # Use local accumulation first, then single write to shared memory
        local_sum = Scalar[dtype](0.0)
        for r in range(size):
            for c in range(size):
                local_sum += rebind[Scalar[dtype]](a[r, c])

        shared_sum[0] = local_sum  # Single write operation

    barrier()  # Ensure thread 0 completes before others read

    # All threads read the safely accumulated result after synchronization
    if row < size and col < size:
        output[row, col] = shared_sum[0]


๋ฌด์—‡์ด ์ž˜๋ชป๋˜์—ˆ๋Š”์ง€ ์ดํ•ดํ•˜๊ธฐ

๊ฒฝ์Ÿ ์ƒํƒœ ๋ฌธ์ œ ํŒจํ„ด

์›๋ž˜ ์‹คํŒจํ•˜๋Š” ์ฝ”๋“œ์—๋Š” ์ด ํ•ต์‹ฌ์ ์ธ ์ค„์ด ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค:

shared_sum[0] += a[row, col]  # ๊ฒฝ์Ÿ ์ƒํƒœ!

์ด ํ•œ ์ค„์ด 4๊ฐœ์˜ ์œ ํšจํ•œ ์Šค๋ ˆ๋“œ ์‚ฌ์ด์—์„œ ์—ฌ๋Ÿฌ ์œ„ํ—˜ ์š”์†Œ๋ฅผ ์ผ์œผํ‚ต๋‹ˆ๋‹ค:

  1. ์Šค๋ ˆ๋“œ (0,0)์ด ์ฝ์Œ shared_sum[0] (๊ฐ’: 0.0)
  2. ์Šค๋ ˆ๋“œ (0,1)์ด ์ฝ์Œ shared_sum[0] (๊ฐ’: 0.0) โ† Read-after-write ์œ„ํ—˜!
  3. ์Šค๋ ˆ๋“œ (0,0)์ด ์”€ 0.0 + 0
  4. ์Šค๋ ˆ๋“œ (1,0)์ด ์”€ 0.0 + 2 โ† Write-after-write ์œ„ํ—˜!

ํ…Œ์ŠคํŠธ๊ฐ€ ์‹คํŒจํ•œ ์ด์œ 

  • += ์—ฐ์‚ฐ ์ค‘ ์—ฌ๋Ÿฌ ์Šค๋ ˆ๋“œ๊ฐ€ ์„œ๋กœ์˜ ์“ฐ๊ธฐ๋ฅผ ์†์ƒ์‹œํ‚ด
  • += ์—ฐ์‚ฐ์ด ์ค‘๋‹จ๋˜์–ด ์—…๋ฐ์ดํŠธ ์†์‹ค ๋ฐœ์ƒ
  • ์˜ˆ์ƒ ํ•ฉ๊ณ„ 6.0 (0+1+2+3)์ด์ง€๋งŒ, ๊ฒฝ์Ÿ ์ƒํƒœ๋กœ ์ธํ•ด 0.0์ด ๋จ
  • barrier()๊ฐ€ ๋„ˆ๋ฌด ๋Šฆ๊ฒŒ ์˜ด - ๊ฒฝ์Ÿ ์ƒํƒœ๊ฐ€ ์ด๋ฏธ ๋ฐœ์ƒํ•œ ํ›„

๊ฒฝ์Ÿ ์ƒํƒœ๋ž€?

๊ฒฝ์Ÿ ์ƒํƒœ๋Š” ์—ฌ๋Ÿฌ ์Šค๋ ˆ๋“œ๊ฐ€ ๊ณต์œ  ๋ฐ์ดํ„ฐ์— ๋™์‹œ์— ์ ‘๊ทผํ•˜๊ณ , ๊ฒฐ๊ณผ๊ฐ€ ์˜ˆ์ธก ๋ถˆ๊ฐ€๋Šฅํ•œ ์Šค๋ ˆ๋“œ ์‹คํ–‰ ํƒ€์ด๋ฐ์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์งˆ ๋•Œ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

์ฃผ์š” ํŠน์„ฑ:

  • ๋น„๊ฒฐ์ •์  ๋™์ž‘: ๊ฐ™์€ ์ฝ”๋“œ๊ฐ€ ๋‹ค๋ฅธ ์‹คํ–‰์—์„œ ๋‹ค๋ฅธ ๊ฒฐ๊ณผ๋ฅผ ๋‚ผ ์ˆ˜ ์žˆ์Œ
  • ํƒ€์ด๋ฐ ์˜์กด์ : ๊ฒฐ๊ณผ๊ฐ€ ์–ด๋–ค ์Šค๋ ˆ๋“œ๊ฐ€ โ€œ๊ฒฝ์Ÿ์—์„œ ์ด๊ธฐ๋Š”์ง€โ€œ์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์ง
  • ์žฌํ˜„ํ•˜๊ธฐ ์–ด๋ ค์›€: ํŠน์ • ์กฐ๊ฑด์ด๋‚˜ ํ•˜๋“œ์›จ์–ด์—์„œ๋งŒ ๋‚˜ํƒ€๋‚  ์ˆ˜ ์žˆ์Œ

GPU ํŠน์œ ์˜ ์œ„ํ—˜์„ฑ

๋Œ€๊ทœ๋ชจ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ์˜ ์˜ํ–ฅ:

  • ์›Œํ”„ ์ˆ˜์ค€ ์†์ƒ: ๊ฒฝ์Ÿ ์ƒํƒœ๊ฐ€ ์ „์ฒด ์›Œํ”„(32๊ฐœ ์Šค๋ ˆ๋“œ)์— ์˜ํ–ฅ์„ ์ค„ ์ˆ˜ ์žˆ์Œ
  • ๋ฉ”๋ชจ๋ฆฌ ๋ณ‘ํ•ฉ ๋ฌธ์ œ: ๊ฒฝ์Ÿ์œผ๋กœ ํšจ์œจ์ ์ธ ๋ฉ”๋ชจ๋ฆฌ ์ ‘๊ทผ ํŒจํ„ด์ด ๊นจ์งˆ ์ˆ˜ ์žˆ์Œ
  • ์ปค๋„ ์ „์ฒด ์‹คํŒจ: ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ ์†์ƒ์ด ์ „์ฒด GPU ์ปค๋„์— ์˜ํ–ฅ์„ ์ค„ ์ˆ˜ ์žˆ์Œ

ํ•˜๋“œ์›จ์–ด ์ฐจ์ด:

  • ๋‹ค๋ฅธ GPU ์•„ํ‚คํ…์ฒ˜: ๊ฒฝ์Ÿ ์ƒํƒœ๊ฐ€ GPU ๋ชจ๋ธ๋งˆ๋‹ค ๋‹ค๋ฅด๊ฒŒ ๋‚˜ํƒ€๋‚  ์ˆ˜ ์žˆ์Œ
  • ๋ฉ”๋ชจ๋ฆฌ ๊ณ„์ธต: L1 ์บ์‹œ, L2 ์บ์‹œ, ์ „์—ญ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๊ฐ๊ฐ ๋‹ค๋ฅธ ๊ฒฝ์Ÿ ๋™์ž‘์„ ๋ณด์ผ ์ˆ˜ ์žˆ์Œ
  • ์›Œํ”„ ์Šค์ผ€์ค„๋ง: ๋‹ค๋ฅธ ์Šค๋ ˆ๋“œ ์Šค์ผ€์ค„๋ง์ด ๋‹ค๋ฅธ ๊ฒฝ์Ÿ ์ƒํƒœ ์‹œ๋‚˜๋ฆฌ์˜ค๋ฅผ ๋…ธ์ถœ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Œ

์ „๋žต: ๋‹จ์ผ ์“ฐ๊ธฐ ํŒจํ„ด

ํ•ต์‹ฌ์€ ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ์— ๋Œ€ํ•œ ๋™์‹œ ์“ฐ๊ธฐ๋ฅผ ์—†์• ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค:

  1. Single writer: ํ•˜๋‚˜์˜ ์Šค๋ ˆ๋“œ(์œ„์น˜ (0,0))๋งŒ ๋ชจ๋“  ๋ˆ„์  ์ž‘์—… ์ˆ˜ํ–‰
  2. ๋กœ์ปฌ ๋ˆ„์ : ์œ„์น˜ (0,0) ์Šค๋ ˆ๋“œ๊ฐ€ ๋กœ์ปฌ ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•ด ๋ฐ˜๋ณต์ ์ธ ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ ์ ‘๊ทผ์„ ํ”ผํ•จ
  3. ๋‹จ์ผ ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ ์“ฐ๊ธฐ: ๋‹จ์ผ ์“ฐ๊ธฐ ์—ฐ์‚ฐ์œผ๋กœ write-write ๊ฒฝ์Ÿ ์ œ๊ฑฐ
  4. ๋ฐฐ๋ฆฌ์–ด ๋™๊ธฐํ™”: writer๊ฐ€ ์™„๋ฃŒ๋œ ํ›„์—์•ผ ๋‹ค๋ฅธ ์Šค๋ ˆ๋“œ๊ฐ€ ์ฝ๋„๋ก ๋ณด์žฅ
  5. ๋‹ค์ค‘ ์ฝ๊ธฐ: ๋ชจ๋“  ์Šค๋ ˆ๋“œ๊ฐ€ ์•ˆ์ „ํ•˜๊ฒŒ ์ตœ์ข… ๊ฒฐ๊ณผ๋ฅผ ์ฝ์Œ

๋‹จ๊ณ„๋ณ„ ์†”๋ฃจ์…˜ ๋ถ„์„

1๋‹จ๊ณ„: ์Šค๋ ˆ๋“œ ์‹๋ณ„

if row == 0 and col == 0:

์ง์ ‘ ์ขŒํ‘œ ๊ฒ€์‚ฌ๋กœ ์œ„์น˜ (0,0)์˜ ์Šค๋ ˆ๋“œ๋ฅผ ์‹๋ณ„ํ•ฉ๋‹ˆ๋‹ค.

2๋‹จ๊ณ„: ๋‹จ์ผ ์Šค๋ ˆ๋“œ ๋ˆ„์ 

if row == 0 and col == 0:
    local_sum = Scalar[dtype](0.0)
    for r in range(size):
        for c in range(size):
            local_sum += rebind[Scalar[dtype]](a[r, c])
    shared_sum[0] = local_sum  # ๋‹จ์ผ ์“ฐ๊ธฐ ์—ฐ์‚ฐ

์œ„์น˜ (0,0)์˜ ์Šค๋ ˆ๋“œ๋งŒ ๋ชจ๋“  ๋ˆ„์  ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜์—ฌ ๊ฒฝ์Ÿ ์ƒํƒœ๋ฅผ ์ œ๊ฑฐํ•ฉ๋‹ˆ๋‹ค.

3๋‹จ๊ณ„: ๋™๊ธฐํ™” ๋ฐฐ๋ฆฌ์–ด

barrier()  # ์Šค๋ ˆ๋“œ (0,0)์ด ์™„๋ฃŒํ•œ ํ›„ ๋‹ค๋ฅธ ์Šค๋ ˆ๋“œ๊ฐ€ ์ฝ๋„๋ก ๋ณด์žฅ

๋ชจ๋“  ์Šค๋ ˆ๋“œ๊ฐ€ ์œ„์น˜ (0,0)์˜ ์Šค๋ ˆ๋“œ๊ฐ€ ๋ˆ„์ ์„ ๋งˆ์น  ๋•Œ๊นŒ์ง€ ๊ธฐ๋‹ค๋ฆฝ๋‹ˆ๋‹ค.

4๋‹จ๊ณ„: ์•ˆ์ „ํ•œ ๋ณ‘๋ ฌ ์ฝ๊ธฐ

if row < size and col < size:
    output[row, col] = shared_sum[0]

๋™๊ธฐํ™” ํ›„ ๋ชจ๋“  ์Šค๋ ˆ๋“œ๊ฐ€ ์•ˆ์ „ํ•˜๊ฒŒ ๊ฒฐ๊ณผ๋ฅผ ์ฝ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

ํšจ์œจ์„ฑ์— ๊ด€ํ•œ ์ค‘์š” ์‚ฌํ•ญ

์ด ์†”๋ฃจ์…˜์€ ํšจ์œจ์„ฑ๋ณด๋‹ค ์ •ํ™•์„ฑ์„ ์šฐ์„ ํ•ฉ๋‹ˆ๋‹ค. ๊ฒฝ์Ÿ ์ƒํƒœ๋Š” ์ œ๊ฑฐํ•˜์ง€๋งŒ, ์œ„์น˜ (0,0) ์Šค๋ ˆ๋“œ๋งŒ ๋ˆ„์ ์— ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์€ GPU ์„ฑ๋Šฅ์— ์ตœ์ ์ด ์•„๋‹™๋‹ˆ๋‹ค - ๋Œ€๊ทœ๋ชจ ๋ณ‘๋ ฌ ์žฅ์น˜์—์„œ ์‚ฌ์‹ค์ƒ ์ง๋ ฌ ๊ณ„์‚ฐ์„ ํ•˜๋Š” ์…ˆ์ž…๋‹ˆ๋‹ค.

์ด์–ด์„œ Puzzle 11: ํ’€๋ง์—์„œ: ๋ชจ๋“  ์Šค๋ ˆ๋“œ๋ฅผ ํ™œ์šฉํ•ด ๊ณ ์„ฑ๋Šฅ ํ•ฉ์‚ฐ ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•˜๋ฉด์„œ๋„ ๊ฒฝ์Ÿ ์ƒํƒœ๋ฅผ ํ”ผํ•˜๋Š” ํšจ์œจ์ ์ธ ๋ณ‘๋ ฌ ๋ฆฌ๋•์…˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๋ฐฐ์›๋‹ˆ๋‹ค. ์ด ํผ์ฆ์€ ์ •ํ™•์„ฑ ์šฐ์„ ์˜ ๊ธฐ์ดˆ๋ฅผ ๊ฐ€๋ฅด์นฉ๋‹ˆ๋‹ค - ๊ฒฝ์Ÿ ์ƒํƒœ๋ฅผ ํ”ผํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ดํ•ดํ•˜๊ณ  ๋‚˜๋ฉด, Puzzle 11์—์„œ ์ •ํ™•์„ฑ๊ณผ ์„ฑ๋Šฅ ๋ชจ๋‘๋ฅผ ๋‹ฌ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

๊ฒ€์ฆ

pixi run compute-sanitizer --tool racecheck mojo solutions/p10/p10.mojo --race-condition

์˜ˆ์ƒ ์ถœ๋ ฅ:

========= COMPUTE-SANITIZER
out shape: 2 x 2
Running race condition example...
out: HostBuffer([6.0, 6.0, 6.0, 6.0])
expected: HostBuffer([6.0, 6.0, 6.0, 6.0])
โœ… Race condition test PASSED! (racecheck will find hazards)
========= RACECHECK SUMMARY: 0 hazards displayed (0 errors, 0 warnings)

โœ… ์„ฑ๊ณต: ํ…Œ์ŠคํŠธ๊ฐ€ ํ†ต๊ณผํ•˜๊ณ  ๊ฒฝ์Ÿ ์ƒํƒœ๊ฐ€ ํƒ์ง€๋˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค!