Puzzle 2: Zip
κ°μ
λ²‘ν° aμ λ²‘ν° bμ κ° μμΉλ₯Ό λν΄ outputμ μ μ₯νλ 컀λμ ꡬνν΄ λ³΄μΈμ.
μ°Έκ³ : κ° μμΉλ§λ€ μ€λ λ 1κ°κ° λ°°μ λ©λλ€.
ν΅μ¬ κ°λ
μ΄ νΌμ¦μμ λ°°μ°λ λ΄μ©:
- μ¬λ¬ μ λ ₯ λ°°μ΄μ λ³λ ¬ μ²λ¦¬
- μ¬λ¬ μ λ ₯μ λν μμλ³ μ°μ°
- λ°°μ΄ κ° μ€λ λ-λ°μ΄ν° λ§€ν
- μ¬λ¬ λ°°μ΄μ λ©λͺ¨λ¦¬ μ κ·Ό ν¨ν΄
κ° μ€λ λ \(i\)μ λν΄: \[\Large output[i] = a[i] + b[i]\]
λ©λͺ¨λ¦¬ μ κ·Ό ν¨ν΄
Thread 0: a[0] + b[0] β output[0]
Thread 1: a[1] + b[1] β output[1]
Thread 2: a[2] + b[2] β output[2]
...
π‘ μ°Έκ³ : μ΄μ 컀λμμ μΈ κ°μ λ°°μ΄(a, b, output)μ λ€λ£¨κ³ μμ΅λλ€. μ°μ°μ΄ 볡μ‘ν΄μ§μλ‘ μ¬λ¬ λ°°μ΄μ λν μ κ·Όμ κ΄λ¦¬νκΈ°κ° μ μ μ΄λ €μμ§λλ€.
μμ±ν μ½λ
comptime SIZE = 4
comptime BLOCKS_PER_GRID = 1
comptime THREADS_PER_BLOCK = SIZE
comptime dtype = DType.float32
fn add(
output: UnsafePointer[Scalar[dtype], MutAnyOrigin],
a: UnsafePointer[Scalar[dtype], MutAnyOrigin],
b: UnsafePointer[Scalar[dtype], MutAnyOrigin],
):
i = thread_idx.x
# FILL ME IN (roughly 1 line)
μ 체 μ½λ 보기: problems/p02/p02.mojo
ν
thread_idx.xλ₯Όiμ μ μ₯ν©λλ€a[i]μb[i]λ₯Ό λν©λλ€- κ²°κ³Όλ₯Ό
output[i]μ μ μ₯ν©λλ€
μ½λ μ€ν
μ루μ μ ν μ€νΈνλ €λ©΄ ν°λ―Έλμμ λ€μ λͺ λ Ήμ΄λ₯Ό μ€ννμΈμ:
pixi run p02
pixi run -e amd p02
pixi run -e apple p02
uv run poe p02
νΌμ¦μ μμ§ νμ§ μμλ€λ©΄ μΆλ ₯μ΄ λ€μκ³Ό κ°μ΄ λνλ©λλ€:
out: HostBuffer([0.0, 0.0, 0.0, 0.0])
expected: HostBuffer([0.0, 2.0, 4.0, 6.0])
μ루μ
fn add(
output: UnsafePointer[Scalar[dtype], MutAnyOrigin],
a: UnsafePointer[Scalar[dtype], MutAnyOrigin],
b: UnsafePointer[Scalar[dtype], MutAnyOrigin],
):
i = thread_idx.x
output[i] = a[i] + b[i]
μ΄ μ루μ μ:
i = thread_idx.xλ‘ μ€λ λ μΈλ±μ€λ₯Ό κ°μ Έμ΅λλ€- λ λ°°μ΄μ κ°μ λν©λλ€:
output[i] = a[i] + b[i]
μμΌλ‘ λ€λ£° λ΄μ©
μ§μ μΈλ±μ±μ κ°λ¨ν μμλ³ μ°μ°μμ μ μλνμ§λ§, λ€μ μν©μ μκ°ν΄ 보μΈμ:
- λ°°μ΄μ λ μ΄μμμ΄ μλ‘ λ€λ₯΄λ€λ©΄?
- ν λ°°μ΄μ λ€λ₯Έ λ°°μ΄μ λΈλ‘λμΊμ€νΈν΄μΌ νλ€λ©΄?
- μ¬λ¬ λ°°μ΄μμ λ³ν©(coalesced) μ κ·Όμ μ΄λ»κ² 보μ₯ν μ μμκΉ?
μ΄λ¬ν μ§λ¬Έλ€μ Puzzle 4μ LayoutTensor μμ보기μμ λ€λ£Ήλλ€.