Puzzle 3: κ°λ
κ°μ
λ²‘ν° aμ κ° μμΉμ 10μ λν΄ outputμ μ μ₯νλ 컀λμ ꡬνν΄ λ³΄μΈμ.
μ°Έκ³ : μ€λ λ μκ° λ°μ΄ν° κ°μλ³΄λ€ λ§μμ, μΌλΆ μ€λ λλ μ²λ¦¬ν λ°μ΄ν°κ° μμ΅λλ€. μ΄λ° μ€λ λκ° λ²μλ₯Ό λ²μ΄λ λ©λͺ¨λ¦¬μ μ κ·Όνμ§ μλλ‘ λ°©μ§ν΄μΌ ν©λλ€.
ν΅μ¬ κ°λ
μ΄ νΌμ¦μμ λ€λ£¨λ λ΄μ©:
- μ€λ λ μμ λ°μ΄ν° ν¬κΈ° λΆμΌμΉ μ²λ¦¬
- λ²μλ₯Ό λ²μ΄λ λ©λͺ¨λ¦¬ μ κ·Ό λ°©μ§
- GPU 컀λμμ μ‘°κ±΄λΆ μ€ν μ¬μ©
- μμ ν λ©λͺ¨λ¦¬ μ κ·Ό ν¨ν΄
μνμ νν
κ° μ€λ λ \(i\)μ λν΄: \[\Large \text{if}\ i < \text{size}: output[i] = a[i] + 10\]
λ©λͺ¨λ¦¬ μμ ν¨ν΄
Thread 0 (i=0): if 0 < size: output[0] = a[0] + 10 β Valid
Thread 1 (i=1): if 1 < size: output[1] = a[1] + 10 β Valid
Thread 2 (i=2): if 2 < size: output[2] = a[2] + 10 β Valid
Thread 3 (i=3): if 3 < size: output[3] = a[3] + 10 β Valid
Thread 4 (i=4): if 4 < size: β Skip (out of bounds)
Thread 5 (i=5): if 5 < size: β Skip (out of bounds)
π‘ μ°Έκ³ : λ€μ μν©μμ κ²½κ³(boundary) κ²μ¬λ μ μ 볡μ‘ν΄μ§λλ€:
- λ€μ°¨μ λ°°μ΄
- λ€μν λ°°μ΄ νν
- 볡μ‘ν μ κ·Ό ν¨ν΄
μμ±ν μ½λ
comptime SIZE = 4
comptime BLOCKS_PER_GRID = 1
comptime THREADS_PER_BLOCK = 8
comptime dtype = DType.float32
fn add_10_guard(
output: UnsafePointer[Scalar[dtype], MutAnyOrigin],
a: UnsafePointer[Scalar[dtype], MutAnyOrigin],
size: UInt,
):
i = thread_idx.x
# FILL ME IN (roughly 2 lines)
μ 체 μ½λ 보기: problems/p03/p03.mojo
ν
thread_idx.xλ₯Όiμ μ μ₯ν©λλ€- κ°λ μΆκ°:
if i < size - κ°λ λ΄λΆ:
output[i] = a[i] + 10.0
μ½λ μ€ν
μ루μ μ ν μ€νΈνλ €λ©΄ ν°λ―Έλμμ λ€μ λͺ λ Ήμ΄λ₯Ό μ€ννμΈμ:
pixi run p03
pixi run -e amd p03
pixi run -e apple p03
uv run poe p03
νΌμ¦μ μμ§ νμ§ μμλ€λ©΄ μΆλ ₯μ΄ λ€μκ³Ό κ°μ΄ λνλ©λλ€:
out: HostBuffer([0.0, 0.0, 0.0, 0.0])
expected: HostBuffer([10.0, 11.0, 12.0, 13.0])
μ루μ
fn add_10_guard(
output: UnsafePointer[Scalar[dtype], MutAnyOrigin],
a: UnsafePointer[Scalar[dtype], MutAnyOrigin],
size: UInt,
):
i = thread_idx.x
if i < size:
output[i] = a[i] + 10.0
μ΄ μ루μ μ:
i = thread_idx.xλ‘ μ€λ λ μΈλ±μ€λ₯Ό κ°μ Έμ΅λλ€if i < sizeλ‘ λ²μλ₯Ό λ²μ΄λ μ κ·Όμ λ°©μ§ν©λλ€- κ°λ λ΄λΆ: μ λ ₯κ°μ 10μ λν©λλ€
κ²½κ³ κ²μ¬ μμ΄λ ν μ€νΈκ° ν΅κ³Όλλ μ΄μ κ° κΆκΈν μ μμ΅λλ€! ν μ€νΈ ν΅κ³Όκ° μ½λμ μμ μ±μ΄λ λ―Έμ μ λμ(Undefined Behavior) λΆμ¬λ₯Ό 보μ₯νμ§λ μλλ€λ μ μ νμ κΈ°μ΅νμΈμ. Puzzle 10μμ μ΄λ° κ²½μ°λ₯Ό μ΄ν΄λ³΄κ³ , μμ μ± λ²κ·Έλ₯Ό μ‘λ λꡬλ₯Ό μ¬μ©ν΄ λ΄ λλ€.
μμΌλ‘ λ€λ£° λ΄μ©
κ°λ¨ν κ²½κ³ κ²μ¬λ μ¬κΈ°μ μ μλνμ§λ§, λ€μ μν©μ μκ°ν΄ 보μΈμ:
- 2D/3D λ°°μ΄μ κ²½κ³λ μ΄λ»κ² μ²λ¦¬ν κΉ?
- λ€μν ννλ₯Ό ν¨μ¨μ μΌλ‘ μ²λ¦¬νλ €λ©΄?
- ν¨λ©μ΄λ κ°μ₯μ리 μ²λ¦¬κ° νμνλ€λ©΄?
볡μ‘λκ° μ¦κ°νλ μμ:
# νμ¬: 1D κ²½κ³ κ²μ¬
if i < size: ...
# κ³§ λ€λ£° λ΄μ©: 2D κ²½κ³ κ²μ¬
if i < height and j < width: ...
# μ΄ν: ν¨λ©μ΄ μλ 3D
if i < height and j < width and k < depth and
i >= padding and j >= padding: ...
μ΄λ° κ²½κ³ μ²λ¦¬ ν¨ν΄μ Puzzle 4μ LayoutTensor μμ보기μμ λ°°μ°λ©΄ ν¨μ¬ κΉλν΄μ§λλ€. LayoutTensorλ νν κ΄λ¦¬ κΈ°λ₯μ κΈ°λ³ΈμΌλ‘ μ 곡ν©λλ€.