핡심 κ°œλ…

이 νΌμ¦μ—μ„œ λ°°μš°λŠ” λ‚΄μš©:

  • κΈ°λ³Έ GPU 컀널 ꡬ쑰

  • thread_idx.xλ₯Ό μ‚¬μš©ν•œ μŠ€λ ˆλ“œ 인덱싱

  • κ°„λ‹¨ν•œ 병렬 μ—°μ‚°

  • 병렬성: 각 μŠ€λ ˆλ“œκ°€ λ…λ¦½μ μœΌλ‘œ μ‹€ν–‰λ©λ‹ˆλ‹€

  • μŠ€λ ˆλ“œ 인덱싱: i = thread_idx.x μœ„μΉ˜μ˜ μš”μ†Œμ— μ ‘κ·Όν•©λ‹ˆλ‹€

  • λ©”λͺ¨λ¦¬ μ ‘κ·Ό: a[i]μ—μ„œ 읽고 output[i]에 μ”λ‹ˆλ‹€

  • 데이터 독립성: 각 좜λ ₯은 ν•΄λ‹Ή μž…λ ₯μ—λ§Œ μ˜μ‘΄ν•©λ‹ˆλ‹€

μ™„μ„±ν•  μ½”λ“œ

comptime SIZE = 4
comptime BLOCKS_PER_GRID = 1
comptime THREADS_PER_BLOCK = SIZE
comptime dtype = DType.float32


fn add_10(
    output: UnsafePointer[Scalar[dtype], MutAnyOrigin],
    a: UnsafePointer[Scalar[dtype], MutAnyOrigin],
):
    i = thread_idx.x
    # FILL ME IN (roughly 1 line)


전체 μ½”λ“œ 보기: problems/p01/p01.mojo

팁
  1. thread_idx.xλ₯Ό i에 μ €μž₯ν•©λ‹ˆλ‹€
  2. a[i]에 10을 λ”ν•©λ‹ˆλ‹€
  3. κ²°κ³Όλ₯Ό output[i]에 μ €μž₯ν•©λ‹ˆλ‹€

μ½”λ“œ μ‹€ν–‰

μ†”λ£¨μ…˜μ„ ν…ŒμŠ€νŠΈν•˜λ €λ©΄ ν„°λ―Έλ„μ—μ„œ λ‹€μŒ λͺ…λ Ήμ–΄λ₯Ό μ‹€ν–‰ν•˜μ„Έμš”:

pixi run p01
pixi run -e amd p01
pixi run -e apple p01
uv run poe p01

퍼즐을 아직 ν’€μ§€ μ•Šμ•˜λ‹€λ©΄ 좜λ ₯이 λ‹€μŒκ³Ό 같이 λ‚˜νƒ€λ‚©λ‹ˆλ‹€:

out: HostBuffer([0.0, 0.0, 0.0, 0.0])
expected: HostBuffer([10.0, 11.0, 12.0, 13.0])

μ†”λ£¨μ…˜

fn add_10(
    output: UnsafePointer[Scalar[dtype], MutAnyOrigin],
    a: UnsafePointer[Scalar[dtype], MutAnyOrigin],
):
    i = thread_idx.x
    output[i] = a[i] + 10.0


이 μ†”λ£¨μ…˜μ€:

  • i = thread_idx.x둜 μŠ€λ ˆλ“œ 인덱슀λ₯Ό κ°€μ Έμ˜΅λ‹ˆλ‹€
  • μž…λ ₯값에 10을 λ”ν•©λ‹ˆλ‹€: output[i] = a[i] + 10.0