κ°μ
2D μ μ¬κ° νλ ¬ aμ κ° μμΉμ 10μ λν΄ 2D μ μ¬κ° νλ ¬ outputμ μ μ₯νλ 컀λμ ꡬνν΄ λ³΄μΈμ.
μ°Έκ³ : μ€λ λ μκ° νλ ¬μ μμΉ μλ³΄λ€ λ§μ΅λλ€.
ν΅μ¬ κ°λ
μ΄ νΌμ¦μμ λ°°μΈ λ΄μ©:
- 2D μ€λ λ μΈλ±μ€ λ€λ£¨κΈ° (
thread_idx.x,thread_idx.y) - 2D μ’νλ₯Ό 1D λ©λͺ¨λ¦¬ μΈλ±μ€λ‘ λ³ννκΈ°
- 2μ°¨μμμ κ²½κ³ κ²μ¬ μ²λ¦¬νκΈ°
ν΅μ¬μ 2D μ€λ λ μ’ν \((i,j)\)λ₯Ό ν¬κΈ° \(n \times n\)μΈ ν μ°μ νλ ¬μ μμλ‘ λ§€ννλ λ°©λ²μ μ΄ν΄νλ κ²μ λλ€. λμμ μ€λ λ μΈλ±μ€κ° λ²μλ₯Ό λ²μ΄λμ§ μλμ§λ νμΈν΄μΌ ν©λλ€.
- 2D μΈλ±μ±: κ° μ€λ λκ° κ³ μ ν \((i,j)\) μμΉλ₯Ό κ°μ§
- λ©λͺ¨λ¦¬ λ μ΄μμ: ν μ°μ μμλ‘ 2Dλ₯Ό 1D λ©λͺ¨λ¦¬μ λ§€ν
- κ°λ 쑰건: λ μ°¨μ λͺ¨λ κ²½κ³ κ²μ¬ νμ
- μ€λ λ λ²μ: μ€λ λ \((3 \times 3)\)κ° νλ ¬ μμ \((2 \times 2)\)λ³΄λ€ λ§μ
μμ±ν μ½λ
comptime SIZE = 2
comptime BLOCKS_PER_GRID = 1
comptime THREADS_PER_BLOCK = (3, 3)
comptime dtype = DType.float32
fn add_10_2d(
output: UnsafePointer[Scalar[dtype], MutAnyOrigin],
a: UnsafePointer[Scalar[dtype], MutAnyOrigin],
size: UInt,
):
row = thread_idx.y
col = thread_idx.x
# FILL ME IN (roughly 2 lines)
μ 체 μ½λ 보기: problems/p04/p04.mojo
ν
- 2D μΈλ±μ€ κ°μ Έμ€κΈ°:
row = thread_idx.y,col = thread_idx.x - κ°λ μΆκ°:
if row < size and col < size - κ°λ λ΄λΆμμ ν μ°μ λ°©μμΌλ‘ 10 λνκΈ°!
μ½λ μ€ν
μ루μ μ ν μ€νΈνλ €λ©΄ ν°λ―Έλμμ λ€μ λͺ λ Ήμ΄λ₯Ό μ€ννμΈμ:
pixi run p04
pixi run -e amd p04
pixi run -e apple p04
uv run poe p04
νΌμ¦μ μμ§ νμ§ μμλ€λ©΄ μΆλ ₯μ΄ λ€μκ³Ό κ°μ΄ λνλ©λλ€:
out: HostBuffer([0.0, 0.0, 0.0, 0.0])
expected: HostBuffer([10.0, 11.0, 12.0, 13.0])
μ루μ
fn add_10_2d(
output: UnsafePointer[Scalar[dtype], MutAnyOrigin],
a: UnsafePointer[Scalar[dtype], MutAnyOrigin],
size: UInt,
):
row = thread_idx.y
col = thread_idx.x
if row < size and col < size:
output[row * size + col] = a[row * size + col] + 10.0
μ΄ μ루μ μ:
- 2D μΈλ±μ€ κ°μ Έμ€κΈ°:
row = thread_idx.y,col = thread_idx.x - κ°λ μΆκ°:
if row < size and col < size - κ°λ λ΄λΆ:
output[row * size + col] = a[row * size + col] + 10.0