κ°œμš”

2D 정사각 ν–‰λ ¬ a의 각 μœ„μΉ˜μ— 10을 더해 2D 정사각 ν–‰λ ¬ output에 μ €μž₯ν•˜λŠ” 컀널을 κ΅¬ν˜„ν•΄ λ³΄μ„Έμš”.

μ°Έκ³ : μŠ€λ ˆλ“œ μˆ˜κ°€ ν–‰λ ¬μ˜ μœ„μΉ˜ μˆ˜λ³΄λ‹€ λ§ŽμŠ΅λ‹ˆλ‹€.

핡심 κ°œλ…

이 νΌμ¦μ—μ„œ 배울 λ‚΄μš©:

  • 2D μŠ€λ ˆλ“œ 인덱슀 닀루기 (thread_idx.x, thread_idx.y)
  • 2D μ’Œν‘œλ₯Ό 1D λ©”λͺ¨λ¦¬ 인덱슀둜 λ³€ν™˜ν•˜κΈ°
  • 2μ°¨μ›μ—μ„œ 경계 검사 μ²˜λ¦¬ν•˜κΈ°

핡심은 2D μŠ€λ ˆλ“œ μ’Œν‘œ \((i,j)\)λ₯Ό 크기 \(n \times n\)인 ν–‰ μš°μ„  ν–‰λ ¬μ˜ μ›μ†Œλ‘œ λ§€ν•‘ν•˜λŠ” 방법을 μ΄ν•΄ν•˜λŠ” κ²ƒμž…λ‹ˆλ‹€. λ™μ‹œμ— μŠ€λ ˆλ“œ μΈλ±μŠ€κ°€ λ²”μœ„λ₯Ό λ²—μ–΄λ‚˜μ§€ μ•ŠλŠ”μ§€λ„ 확인해야 ν•©λ‹ˆλ‹€.

  • 2D 인덱싱: 각 μŠ€λ ˆλ“œκ°€ κ³ μœ ν•œ \((i,j)\) μœ„μΉ˜λ₯Ό 가짐
  • λ©”λͺ¨λ¦¬ λ ˆμ΄μ•„μ›ƒ: ν–‰ μš°μ„  μˆœμ„œλ‘œ 2Dλ₯Ό 1D λ©”λͺ¨λ¦¬μ— λ§€ν•‘
  • κ°€λ“œ 쑰건: 두 차원 λͺ¨λ‘ 경계 검사 ν•„μš”
  • μŠ€λ ˆλ“œ λ²”μœ„: μŠ€λ ˆλ“œ \((3 \times 3)\)κ°€ ν–‰λ ¬ μ›μ†Œ \((2 \times 2)\)보닀 많음

μ™„μ„±ν•  μ½”λ“œ

comptime SIZE = 2
comptime BLOCKS_PER_GRID = 1
comptime THREADS_PER_BLOCK = (3, 3)
comptime dtype = DType.float32


fn add_10_2d(
    output: UnsafePointer[Scalar[dtype], MutAnyOrigin],
    a: UnsafePointer[Scalar[dtype], MutAnyOrigin],
    size: UInt,
):
    row = thread_idx.y
    col = thread_idx.x
    # FILL ME IN (roughly 2 lines)


전체 μ½”λ“œ 보기: problems/p04/p04.mojo

팁
  1. 2D 인덱슀 κ°€μ Έμ˜€κΈ°: row = thread_idx.y, col = thread_idx.x
  2. κ°€λ“œ μΆ”κ°€: if row < size and col < size
  3. κ°€λ“œ λ‚΄λΆ€μ—μ„œ ν–‰ μš°μ„  λ°©μ‹μœΌλ‘œ 10 λ”ν•˜κΈ°!

μ½”λ“œ μ‹€ν–‰

μ†”λ£¨μ…˜μ„ ν…ŒμŠ€νŠΈν•˜λ €λ©΄ ν„°λ―Έλ„μ—μ„œ λ‹€μŒ λͺ…λ Ήμ–΄λ₯Ό μ‹€ν–‰ν•˜μ„Έμš”:

pixi run p04
pixi run -e amd p04
pixi run -e apple p04
uv run poe p04

퍼즐을 아직 ν’€μ§€ μ•Šμ•˜λ‹€λ©΄ 좜λ ₯이 λ‹€μŒκ³Ό 같이 λ‚˜νƒ€λ‚©λ‹ˆλ‹€:

out: HostBuffer([0.0, 0.0, 0.0, 0.0])
expected: HostBuffer([10.0, 11.0, 12.0, 13.0])

μ†”λ£¨μ…˜

fn add_10_2d(
    output: UnsafePointer[Scalar[dtype], MutAnyOrigin],
    a: UnsafePointer[Scalar[dtype], MutAnyOrigin],
    size: UInt,
):
    row = thread_idx.y
    col = thread_idx.x
    if row < size and col < size:
        output[row * size + col] = a[row * size + col] + 10.0


이 μ†”λ£¨μ…˜μ€:

  1. 2D 인덱슀 κ°€μ Έμ˜€κΈ°: row = thread_idx.y, col = thread_idx.x
  2. κ°€λ“œ μΆ”κ°€: if row < size and col < size
  3. κ°€λ“œ λ‚΄λΆ€: output[row * size + col] = a[row * size + col] + 10.0