๋‹ค๋‹จ๊ณ„ ํŒŒ์ดํ”„๋ผ์ธ ์กฐ์ •

๊ฐœ์š”

์กฐ์œจ๋œ 3๋‹จ๊ณ„ ํŒŒ์ดํ”„๋ผ์ธ์„ ํ†ตํ•ด ์ด๋ฏธ์ง€๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” ์ปค๋„์„ ๊ตฌํ˜„ํ•ฉ๋‹ˆ๋‹ค. ์„œ๋กœ ๋‹ค๋ฅธ ์Šค๋ ˆ๋“œ ๊ทธ๋ฃน์ด ํŠนํ™”๋œ ์ฒ˜๋ฆฌ ๋‹จ๊ณ„๋ฅผ ๋‹ด๋‹นํ•˜๊ณ , ๋ช…์‹œ์  ๋ฐฐ๋ฆฌ์–ด๋กœ ๋™๊ธฐํ™”๋ฉ๋‹ˆ๋‹ค.

์ฐธ๊ณ : ์Šค๋ ˆ๋“œ ์—ญํ• ์ด ํŠนํ™”๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค: Stage 1 (์Šค๋ ˆ๋“œ 0-127)์€ ๋ฐ์ดํ„ฐ๋ฅผ ๋กœ๋“œํ•˜๊ณ  ์ „์ฒ˜๋ฆฌํ•˜๋ฉฐ, Stage 2 (์Šค๋ ˆ๋“œ 128-255)๋Š” ๋ธ”๋Ÿฌ ์—ฐ์‚ฐ์„ ์ ์šฉํ•˜๊ณ , Stage 3 (์ „์ฒด ์Šค๋ ˆ๋“œ)์€ ์ตœ์ข… ์Šค๋ฌด๋”ฉ์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

์•Œ๊ณ ๋ฆฌ์ฆ˜ ์•„ํ‚คํ…์ฒ˜: ์ด ํผ์ฆ์€ ํ•˜๋‚˜์˜ GPU ๋ธ”๋ก ์•ˆ์—์„œ ์„œ๋กœ ๋‹ค๋ฅธ ์Šค๋ ˆ๋“œ ๊ทธ๋ฃน์ด ์™„์ „ํžˆ ๋‹ค๋ฅธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‹คํ–‰ํ•˜๋Š” ์ƒ์‚ฐ์ž-์†Œ๋น„์ž ํŒŒ์ดํ”„๋ผ์ธ์„ ๊ตฌํ˜„ํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋“  ์Šค๋ ˆ๋“œ๊ฐ€ ์„œ๋กœ ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ๋™์ผํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‹คํ–‰ํ•˜๋Š” ์ „ํ†ต์ ์ธ GPU ํ”„๋กœ๊ทธ๋ž˜๋ฐ๊ณผ ๋‹ฌ๋ฆฌ, ์ด ์ ‘๊ทผ ๋ฐฉ์‹์€ ์Šค๋ ˆ๋“œ๋ฅผ ๊ธฐ๋Šฅ๋ณ„๋กœ ํŠนํ™”ํ•˜์—ฌ ๋ถ„ํ• ํ•ฉ๋‹ˆ๋‹ค.

ํŒŒ์ดํ”„๋ผ์ธ ๊ฐœ๋…: ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ์„ธ ๊ฐœ์˜ ๊ตฌ๋ถ„๋œ ๋‹จ๊ณ„๋ฅผ ํ†ตํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋ฉฐ, ๊ฐ ๋‹จ๊ณ„์—๋Š” ์„œ๋กœ ๋‹ค๋ฅธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‹คํ–‰ํ•˜๋Š” ํŠนํ™”๋œ ์Šค๋ ˆ๋“œ ๊ทธ๋ฃน์ด ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฐ ๋‹จ๊ณ„๋Š” ๋‹ค์Œ ๋‹จ๊ณ„๊ฐ€ ์†Œ๋น„ํ•˜๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜์—ฌ, ๋ฐฐ๋ฆฌ์–ด๋กœ ์‹ ์ค‘ํ•˜๊ฒŒ ๋™๊ธฐํ™”ํ•ด์•ผ ํ•˜๋Š” ๋ช…์‹œ์  ์ƒ์‚ฐ์ž-์†Œ๋น„์ž ๊ด€๊ณ„๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค.

๋ฐ์ดํ„ฐ ์˜์กด์„ฑ๊ณผ ๋™๊ธฐํ™”: ๊ฐ ๋‹จ๊ณ„๋Š” ๋‹ค์Œ ๋‹จ๊ณ„๊ฐ€ ์†Œ๋น„ํ•˜๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค:

  • Stage 1 โ†’ Stage 2: ์ฒซ ๋ฒˆ์งธ ๋‹จ๊ณ„๊ฐ€ ๋ธ”๋Ÿฌ ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•œ ์ „์ฒ˜๋ฆฌ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑ
  • Stage 2 โ†’ Stage 3: ๋‘ ๋ฒˆ์งธ ๋‹จ๊ณ„๊ฐ€ ์ตœ์ข… ์Šค๋ฌด๋”ฉ์„ ์œ„ํ•œ ๋ธ”๋Ÿฌ ๊ฒฐ๊ณผ๋ฅผ ์ƒ์„ฑ
  • ๋ฐฐ๋ฆฌ์–ด๊ฐ€ ๊ฒฝ์Ÿ ์ƒํƒœ๋ฅผ ๋ฐฉ์ง€: ์˜์กดํ•˜๋Š” ๋‹จ๊ณ„๊ฐ€ ์‹œ์ž‘๋˜๊ธฐ ์ „์— ํ•ด๋‹น ๋‹จ๊ณ„๊ฐ€ ์™„์ „ํžˆ ์™„๋ฃŒ๋˜๋„๋ก ๋ณด์žฅ

๊ตฌ์ฒด์ ์œผ๋กœ, ๋‹ค๋‹จ๊ณ„ ํŒŒ์ดํ”„๋ผ์ธ์€ ์„ธ ๊ฐ€์ง€ ์ˆ˜ํ•™ ์—ฐ์‚ฐ์œผ๋กœ ๊ตฌ์„ฑ๋œ ์กฐ์œจ๋œ ์ด๋ฏธ์ง€ ์ฒ˜๋ฆฌ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๊ตฌํ˜„ํ•ฉ๋‹ˆ๋‹ค:

Stage 1 - ์ „์ฒ˜๋ฆฌ ๊ฐ•ํ™”:

\[P[i] = I[i] \times 1.1\]

์—ฌ๊ธฐ์„œ \(P[i]\)๋Š” ์ „์ฒ˜๋ฆฌ๋œ ๋ฐ์ดํ„ฐ์ด๊ณ  \(I[i]\)๋Š” ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์ž…๋‹ˆ๋‹ค.

Stage 2 - ์ˆ˜ํ‰ ๋ธ”๋Ÿฌ ํ•„ํ„ฐ:

\[B[i] = \frac{1}{N_i} \sum_{k=-2}^{2} P[i+k] \quad \text{where } i+k \in [0, 255]\]

์—ฌ๊ธฐ์„œ \(B[i]\)๋Š” ๋ธ”๋Ÿฌ ๊ฒฐ๊ณผ์ด๊ณ , \(N_i\)๋Š” ํƒ€์ผ ๊ฒฝ๊ณ„ ๋‚ด์˜ ์œ ํšจํ•œ ์ด์›ƒ ์ˆ˜์ž…๋‹ˆ๋‹ค.

Stage 3 - ์—ฐ์‡„์  ์ด์›ƒ ์Šค๋ฌด๋”ฉ:

\[F[i] = \begin{cases} (B[i] + B[i+1]) \times 0.6 & \text{if } i = 0 \\ ((B[i] + B[i-1]) \times 0.6 + B[i+1]) \times 0.6 & \text{if } 0 < i < 255 \\ (B[i] + B[i-1]) \times 0.6 & \text{if } i = 255 \end{cases}\]

์—ฌ๊ธฐ์„œ \(F[i]\)๋Š” ์—ฐ์‡„์  ์Šค๋ฌด๋”ฉ์ด ์ ์šฉ๋œ ์ตœ์ข… ์ถœ๋ ฅ์ž…๋‹ˆ๋‹ค.

์Šค๋ ˆ๋“œ ํŠนํ™”:

  • ์Šค๋ ˆ๋“œ 0-127: \(i \in \{0, 1, 2, \ldots, 255\}\)์— ๋Œ€ํ•ด \(P[i]\) ๊ณ„์‚ฐ (์Šค๋ ˆ๋“œ๋‹น 2๊ฐœ ์š”์†Œ)
  • ์Šค๋ ˆ๋“œ 128-255: \(i \in \{0, 1, 2, \ldots, 255\}\)์— ๋Œ€ํ•ด \(B[i]\) ๊ณ„์‚ฐ (์Šค๋ ˆ๋“œ๋‹น 2๊ฐœ ์š”์†Œ)
  • ์ „์ฒด 256๊ฐœ ์Šค๋ ˆ๋“œ: \(i \in \{0, 1, 2, \ldots, 255\}\)์— ๋Œ€ํ•ด \(F[i]\) ๊ณ„์‚ฐ (์Šค๋ ˆ๋“œ๋‹น 1๊ฐœ ์š”์†Œ)

๋™๊ธฐํ™” ์ง€์ :

\[\text{barrier}_1 \Rightarrow P[i] \text{ complete} \Rightarrow \text{barrier}_2 \Rightarrow B[i] \text{ complete} \Rightarrow \text{barrier}_3 \Rightarrow F[i] \text{ complete}\]

ํ•ต์‹ฌ ๊ฐœ๋…

์ด ํผ์ฆ์—์„œ๋Š” ๋‹ค์Œ์„ ๋ฐฐ์›๋‹ˆ๋‹ค:

  • ํ•˜๋‚˜์˜ GPU ๋ธ”๋ก ์•ˆ์—์„œ ์Šค๋ ˆ๋“œ ์—ญํ•  ํŠนํ™” ๊ตฌํ˜„
  • ์ฒ˜๋ฆฌ ๋‹จ๊ณ„ ๊ฐ„ ์ƒ์‚ฐ์ž-์†Œ๋น„์ž ๊ด€๊ณ„ ์กฐ์œจ
  • ์„œ๋กœ ๋‹ค๋ฅธ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๊ฐ„์˜ ๋™๊ธฐํ™”๋ฅผ ์œ„ํ•œ ๋ฐฐ๋ฆฌ์–ด ์‚ฌ์šฉ (๋™์ผํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋‚ด๋ถ€๋ฟ ์•„๋‹ˆ๋ผ)

ํ•ต์‹ฌ ํ†ต์ฐฐ์€ ์„œ๋กœ ๋‹ค๋ฅธ ์Šค๋ ˆ๋“œ ๊ทธ๋ฃน์ด ์™„์ „ํžˆ ๋‹ค๋ฅธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‹คํ–‰ํ•˜๋ฉด์„œ ์ „๋žต์  ๋ฐฐ๋ฆฌ์–ด ๋ฐฐ์น˜๋ฅผ ํ†ตํ•ด ์กฐ์œจ๋˜๋Š” ๋‹ค๋‹จ๊ณ„ ํŒŒ์ดํ”„๋ผ์ธ์„ ์–ด๋–ป๊ฒŒ ์„ค๊ณ„ํ•˜๋Š”์ง€ ์ดํ•ดํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์™œ ์ค‘์š”ํ•œ๊ฐ€: ๋Œ€๋ถ€๋ถ„์˜ GPU ํŠœํ† ๋ฆฌ์–ผ์€ ๋‹จ์ผ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋‚ด์—์„œ์˜ ๋ฐฐ๋ฆฌ์–ด ์‚ฌ์šฉ๋ฒ• - ๋ฆฌ๋•์…˜์ด๋‚˜ ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ ์—ฐ์‚ฐ ์ค‘ ์Šค๋ ˆ๋“œ๋ฅผ ๋™๊ธฐํ™”ํ•˜๋Š” ๊ฒƒ - ์„ ๊ฐ€๋ฅด์นฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์‹ค์ œ GPU ์•Œ๊ณ ๋ฆฌ์ฆ˜์—์„œ๋Š” ์‹ ์ค‘ํ•˜๊ฒŒ ์กฐ์œจํ•ด์•ผ ํ•˜๋Š” ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๊ตฌ๋ถ„๋œ ์ฒ˜๋ฆฌ ๋‹จ๊ณ„๋ฅผ ํฌํ•จํ•˜๋Š” ์•„ํ‚คํ…์ฒ˜์  ๋ณต์žก์„ฑ์ด ํ•„์š”ํ•œ ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์Šต๋‹ˆ๋‹ค. ์ด ํผ์ฆ์€ ๋‹จ์ผ์ฒด์  ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ํŠนํ™”๋˜๊ณ  ์กฐ์œจ๋œ ์ฒ˜๋ฆฌ ํŒŒ์ดํ”„๋ผ์ธ์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

์ด์ „ ํผ์ฆ๊ณผ ํ˜„์žฌ์˜ ๋ฐฐ๋ฆฌ์–ด ์‚ฌ์šฉ ๋น„๊ต:

  • ์ด์ „ ํผ์ฆ (P8, P12, P15): ๋ชจ๋“  ์Šค๋ ˆ๋“œ๊ฐ€ ๋™์ผํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‹คํ–‰ํ•˜๊ณ , ๋ฐฐ๋ฆฌ์–ด๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋‹จ๊ณ„ ๋‚ด์—์„œ ๋™๊ธฐํ™”
  • ์ด ํผ์ฆ: ์„œ๋กœ ๋‹ค๋ฅธ ์Šค๋ ˆ๋“œ ๊ทธ๋ฃน์ด ์„œ๋กœ ๋‹ค๋ฅธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‹คํ–‰ํ•˜๊ณ , ๋ฐฐ๋ฆฌ์–ด๋Š” ์„œ๋กœ ๋‹ค๋ฅธ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๊ฐ„์˜ ์กฐ์œจ

์Šค๋ ˆ๋“œ ํŠนํ™” ์•„ํ‚คํ…์ฒ˜: ์Šค๋ ˆ๋“œ๊ฐ€ ๋ฐ์ดํ„ฐ ์ธ๋ฑ์Šค๋งŒ ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ์™€ ๋‹ฌ๋ฆฌ, ์ด ํผ์ฆ์€ ํŒŒ์ดํ”„๋ผ์ธ์—์„œ์˜ ์—ญํ• ์— ๋”ฐ๋ผ ์Šค๋ ˆ๋“œ๊ฐ€ ๊ทผ๋ณธ์ ์œผ๋กœ ๋‹ค๋ฅธ ์ฝ”๋“œ ๊ฒฝ๋กœ๋ฅผ ์‹คํ–‰ํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ๋ฅผ ๊ตฌํ˜„ํ•ฉ๋‹ˆ๋‹ค.

๊ตฌ์„ฑ

์‹œ์Šคํ…œ ๋งค๊ฐœ๋ณ€์ˆ˜:

  • ์ด๋ฏธ์ง€ ํฌ๊ธฐ: SIZE = 1024 ์š”์†Œ (๊ฐ„์†Œํ™”๋ฅผ ์œ„ํ•ด 1D)
  • ๋ธ”๋ก๋‹น ์Šค๋ ˆ๋“œ ์ˆ˜: TPB = 256 ์Šค๋ ˆ๋“œ, (256, 1) ๋ธ”๋ก ์ฐจ์›์œผ๋กœ ๊ตฌ์„ฑ
  • ๊ทธ๋ฆฌ๋“œ ๊ตฌ์„ฑ: ์ „์ฒด ์ด๋ฏธ์ง€๋ฅผ ํƒ€์ผ ๋‹จ์œ„๋กœ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•œ (4, 1) ๋ธ”๋ก (์ด 4๊ฐœ ๋ธ”๋ก)
  • ๋ฐ์ดํ„ฐ ํƒ€์ž…: ๋ชจ๋“  ์—ฐ์‚ฐ์— DType.float32

์Šค๋ ˆ๋“œ ํŠนํ™” ์•„ํ‚คํ…์ฒ˜:

  • Stage 1 ์Šค๋ ˆ๋“œ: STAGE1_THREADS = 128 (์Šค๋ ˆ๋“œ 0-127, ๋ธ”๋ก์˜ ์ „๋ฐ˜๋ถ€)

    • ์—ญํ• : ์ „์—ญ ๋ฉ”๋ชจ๋ฆฌ์—์„œ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋ฅผ ๋กœ๋“œํ•˜๊ณ  ์ „์ฒ˜๋ฆฌ ์ ์šฉ
    • ์ž‘์—… ๋ถ„๋ฐฐ: ํšจ์œจ์ ์ธ ๋ถ€ํ•˜ ๊ท ํ˜•์„ ์œ„ํ•ด ์Šค๋ ˆ๋“œ๋‹น 2๊ฐœ ์š”์†Œ ์ฒ˜๋ฆฌ
    • ์ถœ๋ ฅ: input_shared[256]์— ์ „์ฒ˜๋ฆฌ๋œ ๋ฐ์ดํ„ฐ ์ฑ„์šฐ๊ธฐ
  • Stage 2 ์Šค๋ ˆ๋“œ: STAGE2_THREADS = 128 (์Šค๋ ˆ๋“œ 128-255, ๋ธ”๋ก์˜ ํ›„๋ฐ˜๋ถ€)

    • ์—ญํ• : ์ „์ฒ˜๋ฆฌ๋œ ๋ฐ์ดํ„ฐ์— ์ˆ˜ํ‰ ๋ธ”๋Ÿฌ ํ•„ํ„ฐ ์ ์šฉ
    • ์ž‘์—… ๋ถ„๋ฐฐ: ์Šค๋ ˆ๋“œ๋‹น 2๊ฐœ์˜ ๋ธ”๋Ÿฌ ์—ฐ์‚ฐ ์ฒ˜๋ฆฌ
    • ์ถœ๋ ฅ: blur_shared[256]์— ๋ธ”๋Ÿฌ ๊ฒฐ๊ณผ ์ฑ„์šฐ๊ธฐ
  • Stage 3 ์Šค๋ ˆ๋“œ: ์ „์ฒด 256๊ฐœ ์Šค๋ ˆ๋“œ ํ˜‘๋ ฅ

    • ์—ญํ• : ์ตœ์ข… ์Šค๋ฌด๋”ฉ ๋ฐ ์ „์—ญ ๋ฉ”๋ชจ๋ฆฌ๋กœ ์ถœ๋ ฅ
    • ์ž‘์—… ๋ถ„๋ฐฐ: ์ผ๋Œ€์ผ ๋งคํ•‘ (์Šค๋ ˆ๋“œ i๊ฐ€ ์š”์†Œ i๋ฅผ ์ฒ˜๋ฆฌ)
    • ์ถœ๋ ฅ: ๊ธ€๋กœ๋ฒŒ output ๋ฐฐ์—ด์— ์ตœ์ข… ๊ฒฐ๊ณผ ๊ธฐ๋ก

์™„์„ฑํ•  ์ฝ”๋“œ


comptime TPB = 256  # Threads per block for pipeline stages
comptime SIZE = 1024  # Image size (1D for simplicity)
comptime BLOCKS_PER_GRID = (4, 1)
comptime THREADS_PER_BLOCK = (TPB, 1)
comptime dtype = DType.float32
comptime layout = Layout.row_major(SIZE)

# Multi-stage processing configuration
comptime STAGE1_THREADS = TPB // 2
comptime STAGE2_THREADS = TPB // 2
comptime BLUR_RADIUS = 2


fn multi_stage_image_blur_pipeline[
    layout: Layout
](
    output: LayoutTensor[dtype, layout, MutAnyOrigin],
    input: LayoutTensor[dtype, layout, ImmutAnyOrigin],
    size: Int,
):
    """Multi-stage image blur pipeline with barrier coordination.

    Stage 1 (threads 0-127): Load input data and apply 1.1x preprocessing
    Stage 2 (threads 128-255): Apply 5-point blur with BLUR_RADIUS=2
    Stage 3 (all threads): Final neighbor smoothing and output
    """

    # Shared memory buffers for pipeline stages
    input_shared = LayoutTensor[
        dtype,
        Layout.row_major(TPB),
        MutAnyOrigin,
        address_space = AddressSpace.SHARED,
    ].stack_allocation()
    blur_shared = LayoutTensor[
        dtype,
        Layout.row_major(TPB),
        MutAnyOrigin,
        address_space = AddressSpace.SHARED,
    ].stack_allocation()

    global_i = Int(block_dim.x * block_idx.x + thread_idx.x)
    local_i = Int(thread_idx.x)

    # Stage 1: Load and preprocess (threads 0-127)

    # FILL ME IN (roughly 10 lines)

    barrier()  # Wait for Stage 1 completion

    # Stage 2: Apply blur (threads 128-255)

    # FILL ME IN (roughly 25 lines)

    barrier()  # Wait for Stage 2 completion

    # Stage 3: Final smoothing (all threads)

    # FILL ME IN (roughly 7 lines)

    barrier()  # Ensure all writes complete


์ „์ฒด ํŒŒ์ผ ๋ณด๊ธฐ: problems/p29/p29.mojo

ํŒ

์Šค๋ ˆ๋“œ ์—ญํ•  ์‹๋ณ„

  • ์Šค๋ ˆ๋“œ ์ธ๋ฑ์Šค ๋น„๊ต๋ฅผ ํ†ตํ•ด ๊ฐ ์Šค๋ ˆ๋“œ๊ฐ€ ์–ด๋–ค ๋‹จ๊ณ„๋ฅผ ์‹คํ–‰ํ•ด์•ผ ํ•˜๋Š”์ง€ ๊ฒฐ์ •
  • Stage 1: ์ „๋ฐ˜๋ถ€ ์Šค๋ ˆ๋“œ (์Šค๋ ˆ๋“œ 0-127)
  • Stage 2: ํ›„๋ฐ˜๋ถ€ ์Šค๋ ˆ๋“œ (์Šค๋ ˆ๋“œ 128-255)
  • Stage 3: ๋ชจ๋“  ์Šค๋ ˆ๋“œ ์ฐธ์—ฌ

Stage 1 ์ ‘๊ทผ ๋ฐฉ์‹

  • ์ ์ ˆํ•œ ์ธ๋ฑ์Šค ๋น„๊ต๋ฅผ ํ†ตํ•ด Stage 1 ์Šค๋ ˆ๋“œ ์‹๋ณ„
  • ๋ถ€ํ•˜ ๊ท ํ˜•์„ ์œ„ํ•ด ๊ฐ ์Šค๋ ˆ๋“œ๊ฐ€ ์—ฌ๋Ÿฌ ์š”์†Œ๋ฅผ ์ฒ˜๋ฆฌ
  • ์ „์ฒ˜๋ฆฌ ๊ฐ•ํ™” ๊ณ„์ˆ˜ ์ ์šฉ
  • ์ œ๋กœ ํŒจ๋”ฉ์„ ์‚ฌ์šฉํ•œ ์ ์ ˆํ•œ ๊ฒฝ๊ณ„ ์ฒ˜๋ฆฌ ๊ตฌํ˜„

Stage 2 ์ ‘๊ทผ ๋ฐฉ์‹

  • Stage 2 ์Šค๋ ˆ๋“œ๋ฅผ ์‹๋ณ„ํ•˜๊ณ  ์ธ๋ฑ์Šค๋ฅผ ์ฒ˜๋ฆฌ ๋ฒ”์œ„์— ๋งคํ•‘
  • ์ด์›ƒ ์š”์†Œ์˜ ํ‰๊ท ์„ ๊ตฌํ•˜๋Š” ๋ธ”๋Ÿฌ ์ปค๋„ ๊ตฌํ˜„
  • ์œ ํšจํ•œ ์ด์›ƒ๋งŒ ํฌํ•จํ•˜์—ฌ ๊ฒฝ๊ณ„ ์กฐ๊ฑด ์ฒ˜๋ฆฌ
  • ํšจ์œจ์„ฑ์„ ์œ„ํ•ด ์Šค๋ ˆ๋“œ๋‹น ์—ฌ๋Ÿฌ ์š”์†Œ ์ฒ˜๋ฆฌ

Stage 3 ์ ‘๊ทผ ๋ฐฉ์‹

  • ๋ชจ๋“  ์Šค๋ ˆ๋“œ๊ฐ€ ์ตœ์ข… ์ฒ˜๋ฆฌ์— ์ฐธ์—ฌ
  • ์ง€์ •๋œ ์Šค์ผ€์ผ๋ง ๊ณ„์ˆ˜๋ฅผ ์‚ฌ์šฉํ•œ ์ด์›ƒ ์Šค๋ฌด๋”ฉ ์ ์šฉ
  • ์ด์›ƒ์ด ์กด์žฌํ•˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ์˜ ์—ฃ์ง€ ์ผ€์ด์Šค ์ฒ˜๋ฆฌ
  • ๊ฒฝ๊ณ„ ๊ฒ€์‚ฌ๋ฅผ ํ†ตํ•ด ๊ธ€๋กœ๋ฒŒ ์ถœ๋ ฅ์— ๊ฒฐ๊ณผ ๊ธฐ๋ก

๋™๊ธฐํ™” ์ „๋žต

  • ๊ฒฝ์Ÿ ์ƒํƒœ๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ๋‹จ๊ณ„ ์‚ฌ์ด์— ๋ฐฐ๋ฆฌ์–ด ๋ฐฐ์น˜
  • ์˜์กดํ•˜๋Š” ๋‹จ๊ณ„๊ฐ€ ์‹œ์ž‘๋˜๊ธฐ ์ „์— ๊ฐ ๋‹จ๊ณ„๊ฐ€ ์™„๋ฃŒ๋˜๋„๋ก ๋ณด์žฅ
  • ๋ธ”๋ก ์ข…๋ฃŒ ์ „ ์™„๋ฃŒ๋ฅผ ๋ณด์žฅํ•˜๊ธฐ ์œ„ํ•ด ์ตœ์ข… ๋ฐฐ๋ฆฌ์–ด ์‚ฌ์šฉ

์ฝ”๋“œ ์‹คํ–‰

์†”๋ฃจ์…˜์„ ํ…Œ์ŠคํŠธํ•˜๋ ค๋ฉด ํ„ฐ๋ฏธ๋„์—์„œ ๋‹ค์Œ ๋ช…๋ น์„ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค:

pixi run p29 --multi-stage
pixi run -e amd p29 --multi-stage
uv run poe p29 --multi-stage

ํผ์ฆ์„ ์„ฑ๊ณต์ ์œผ๋กœ ์™„๋ฃŒํ•˜๋ฉด ๋‹ค์Œ๊ณผ ์œ ์‚ฌํ•œ ์ถœ๋ ฅ์ด ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค:

Puzzle 29: GPU Synchronization Primitives
==================================================
TPB: 256
SIZE: 1024
STAGE1_THREADS: 128
STAGE2_THREADS: 128
BLUR_RADIUS: 2

Testing Puzzle 29A: Multi-Stage Pipeline Coordination
============================================================
Multi-stage pipeline blur completed
Input sample: 0.0 1.01 2.02
Output sample: 1.6665002 2.3331003 3.3996604
โœ… Multi-stage pipeline coordination test PASSED!

์†”๋ฃจ์…˜

fn multi_stage_image_blur_pipeline[
    layout: Layout
](
    output: LayoutTensor[dtype, layout, MutAnyOrigin],
    input: LayoutTensor[dtype, layout, ImmutAnyOrigin],
    size: Int,
):
    """Multi-stage image blur pipeline with barrier coordination.

    Stage 1 (threads 0-127): Load input data and apply 1.1x preprocessing
    Stage 2 (threads 128-255): Apply 5-point blur with BLUR_RADIUS=2
    Stage 3 (all threads): Final neighbor smoothing and output
    """

    # Shared memory buffers for pipeline stages
    input_shared = LayoutTensor[
        dtype,
        Layout.row_major(TPB),
        MutAnyOrigin,
        address_space = AddressSpace.SHARED,
    ].stack_allocation()
    blur_shared = LayoutTensor[
        dtype,
        Layout.row_major(TPB),
        MutAnyOrigin,
        address_space = AddressSpace.SHARED,
    ].stack_allocation()

    global_i = Int(block_dim.x * block_idx.x + thread_idx.x)
    local_i = Int(thread_idx.x)

    # Stage 1: Load and preprocess (threads 0-127)
    if local_i < STAGE1_THREADS:
        if global_i < size:
            input_shared[local_i] = input[global_i] * 1.1
            # Each thread loads 2 elements
            if local_i + STAGE1_THREADS < size:
                input_shared[local_i + STAGE1_THREADS] = (
                    input[global_i + STAGE1_THREADS] * 1.1
                )
        else:
            # Zero-padding for out-of-bounds
            input_shared[local_i] = 0.0
            if local_i + STAGE1_THREADS < TPB:
                input_shared[local_i + STAGE1_THREADS] = 0.0

    barrier()  # Wait for Stage 1 completion

    # Stage 2: Apply blur (threads 128-255)
    if local_i >= STAGE1_THREADS:
        blur_idx = local_i - STAGE1_THREADS
        var blur_sum: Scalar[dtype] = 0.0
        blur_count = 0

        # 5-point blur kernel
        for offset in range(-BLUR_RADIUS, BLUR_RADIUS + 1):
            sample_idx = blur_idx + offset
            if sample_idx >= 0 and sample_idx < TPB:
                blur_sum += rebind[Scalar[dtype]](input_shared[sample_idx])
                blur_count += 1

        if blur_count > 0:
            blur_shared[blur_idx] = blur_sum / blur_count
        else:
            blur_shared[blur_idx] = 0.0

        # Process second element
        second_idx = blur_idx + STAGE1_THREADS
        if second_idx < TPB:
            blur_sum = 0.0
            blur_count = 0
            for offset in range(-BLUR_RADIUS, BLUR_RADIUS + 1):
                sample_idx = second_idx + offset
                if sample_idx >= 0 and sample_idx < TPB:
                    blur_sum += rebind[Scalar[dtype]](input_shared[sample_idx])
                    blur_count += 1

            if blur_count > 0:
                blur_shared[second_idx] = blur_sum / blur_count
            else:
                blur_shared[second_idx] = 0.0

    barrier()  # Wait for Stage 2 completion

    # Stage 3: Final smoothing (all threads)
    if global_i < size:
        final_value = blur_shared[local_i]

        # Neighbor smoothing with 0.6 scaling
        if local_i > 0:
            final_value = (final_value + blur_shared[local_i - 1]) * 0.6
        if local_i < TPB - 1:
            final_value = (final_value + blur_shared[local_i + 1]) * 0.6

        output[global_i] = final_value

    barrier()  # Ensure all writes complete


ํ•ต์‹ฌ ํ†ต์ฐฐ์€ ์ด๊ฒƒ์ด ์Šค๋ ˆ๋“œ ์—ญํ•  ํŠนํ™”๋ฅผ ๊ฐ€์ง„ ํŒŒ์ดํ”„๋ผ์ธ ์•„ํ‚คํ…์ฒ˜ ๋ฌธ์ œ์ž„์„ ์ธ์‹ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค:

  1. ๋‹จ๊ณ„๋ณ„ ์Šค๋ ˆ๋“œ ๊ทธ๋ฃน ์„ค๊ณ„: ๋ฐ์ดํ„ฐ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๊ธฐ๋Šฅ๋ณ„๋กœ ์Šค๋ ˆ๋“œ๋ฅผ ๋ถ„ํ• 
  2. ์ƒ์‚ฐ์ž-์†Œ๋น„์ž ์ฒด์ธ ๊ตฌํ˜„: Stage 1์ด Stage 2๋ฅผ ์œ„ํ•ด ์ƒ์‚ฐํ•˜๊ณ , Stage 2๊ฐ€ Stage 3์„ ์œ„ํ•ด ์ƒ์‚ฐ
  3. ์ „๋žต์  ๋ฐฐ๋ฆฌ์–ด ๋ฐฐ์น˜: ๋™์ผํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋‚ด๊ฐ€ ์•„๋‹ˆ๋ผ ์„œ๋กœ ๋‹ค๋ฅธ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๊ฐ„์˜ ๋™๊ธฐํ™”
  4. ๋ฉ”๋ชจ๋ฆฌ ์ ‘๊ทผ ํŒจํ„ด ์ตœ์ ํ™”: ๋ณ‘ํ•ฉ๋œ ์ฝ๊ธฐ์™€ ํšจ์œจ์ ์ธ ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ ๋ณด์žฅ

์ƒ์„ธ ์„ค๋ช…์ด ํฌํ•จ๋œ ์ „์ฒด ์†”๋ฃจ์…˜

๋‹ค๋‹จ๊ณ„ ํŒŒ์ดํ”„๋ผ์ธ ์†”๋ฃจ์…˜์€ ์ •๊ตํ•œ ์Šค๋ ˆ๋“œ ํŠนํ™”์™€ ๋ฐฐ๋ฆฌ์–ด ์กฐ์ •์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ์ด ์ ‘๊ทผ ๋ฐฉ์‹์€ ์ „ํ†ต์ ์ธ ๋‹จ์ผ์ฒด์  GPU ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ํŠนํ™”๋˜๊ณ  ์กฐ์œจ๋œ ์ฒ˜๋ฆฌ ํŒŒ์ดํ”„๋ผ์ธ์œผ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

ํŒŒ์ดํ”„๋ผ์ธ ์•„ํ‚คํ…์ฒ˜ ์„ค๊ณ„

์ด ํผ์ฆ์˜ ๊ทผ๋ณธ์ ์ธ ๋ŒํŒŒ๊ตฌ๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ์•„๋‹Œ ์—ญํ• ์— ์˜ํ•œ ์Šค๋ ˆ๋“œ ํŠนํ™”์ž…๋‹ˆ๋‹ค:

์ „ํ†ต์ ์ธ ์ ‘๊ทผ ๋ฐฉ์‹: ๋ชจ๋“  ์Šค๋ ˆ๋“œ๊ฐ€ ์„œ๋กœ ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ๋™์ผํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‹คํ–‰

  • ๋ชจ๋“  ์Šค๋ ˆ๋“œ๊ฐ€ ๋™์ผํ•œ ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ (๋ฆฌ๋•์…˜์ด๋‚˜ ํ–‰๋ ฌ ์—ฐ์‚ฐ ๋“ฑ)
  • ๋ฐฐ๋ฆฌ์–ด๋Š” ๋™์ผํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋‹จ๊ณ„ ๋‚ด์—์„œ ์Šค๋ ˆ๋“œ๋ฅผ ๋™๊ธฐํ™”
  • ์Šค๋ ˆ๋“œ ์—ญํ• ์€ ์ฒ˜๋ฆฌํ•˜๋Š” ๋ฐ์ดํ„ฐ ์ธ๋ฑ์Šค๋งŒ ๋‹ค๋ฆ„

์ด ํผ์ฆ์˜ ํ˜์‹ : ์„œ๋กœ ๋‹ค๋ฅธ ์Šค๋ ˆ๋“œ ๊ทธ๋ฃน์ด ์™„์ „ํžˆ ๋‹ค๋ฅธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‹คํ–‰

  • ์Šค๋ ˆ๋“œ 0-127์ด ๋กœ๋”ฉ ๋ฐ ์ „์ฒ˜๋ฆฌ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‹คํ–‰
  • ์Šค๋ ˆ๋“œ 128-255๊ฐ€ ๋ธ”๋Ÿฌ ์ฒ˜๋ฆฌ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‹คํ–‰
  • ๋ชจ๋“  ์Šค๋ ˆ๋“œ๊ฐ€ ์ตœ์ข… ์Šค๋ฌด๋”ฉ ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ํ˜‘๋ ฅ
  • ๋ฐฐ๋ฆฌ์–ด๋Š” ๋™์ผํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋‚ด๊ฐ€ ์•„๋‹ˆ๋ผ ์„œ๋กœ ๋‹ค๋ฅธ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๊ฐ„์˜ ์กฐ์œจ

์ƒ์‚ฐ์ž-์†Œ๋น„์ž ์กฐ์ •

์Šค๋ ˆ๋“œ๊ฐ€ ๋™์ผํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋‚ด์—์„œ ๋™๋“ฑํ•œ ์—ญํ• ์„ ํ•˜๋˜ ์ด์ „ ํผ์ฆ๊ณผ ๋‹ฌ๋ฆฌ, ์ด ํผ์ฆ์€ ๋ช…์‹œ์ ์ธ ์ƒ์‚ฐ์ž-์†Œ๋น„์ž ๊ด€๊ณ„๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค:

  • Stage 1: ์ƒ์‚ฐ์ž (Stage 2๋ฅผ ์œ„ํ•œ ์ „์ฒ˜๋ฆฌ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ)
  • Stage 2: ์†Œ๋น„์ž (Stage 1์˜ ๋ฐ์ดํ„ฐ ์‚ฌ์šฉ) + ์ƒ์‚ฐ์ž (Stage 3์„ ์œ„ํ•œ ๋ธ”๋Ÿฌ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ)
  • Stage 3: ์†Œ๋น„์ž (Stage 2์˜ ๋ฐ์ดํ„ฐ ์‚ฌ์šฉ)

์ „๋žต์  ๋ฐฐ๋ฆฌ์–ด ๋ฐฐ์น˜

๋ฐฐ๋ฆฌ์–ด๊ฐ€ ์–ธ์ œ ํ•„์š”ํ•˜๊ณ  ์–ธ์ œ ๋‚ญ๋น„์ ์ธ์ง€ ์ดํ•ดํ•˜๊ธฐ:

  • ํ•„์š”ํ•œ ๊ฒฝ์šฐ: ์˜์กด์ ์ธ ๋‹จ๊ณ„ ์‚ฌ์ด์—์„œ ๊ฒฝ์Ÿ ์ƒํƒœ๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด
  • ๋‚ญ๋น„์ ์ธ ๊ฒฝ์šฐ: ๊ฐ™์€ ๋‹จ๊ณ„์˜ ๋…๋ฆฝ์ ์ธ ์—ฐ์‚ฐ ๋‚ด์—์„œ
  • ์„ฑ๋Šฅ ํ†ต์ฐฐ: ๊ฐ ๋ฐฐ๋ฆฌ์–ด์—๋Š” ๋น„์šฉ์ด ์žˆ์œผ๋ฏ€๋กœ ์ „๋žต์ ์œผ๋กœ ์‚ฌ์šฉ

ํ•ต์‹ฌ ๋™๊ธฐํ™” ์ง€์ :

  1. Stage 1 ์ดํ›„: Stage 2๊ฐ€ ๋ถˆ์™„์ „ํ•œ ์ „์ฒ˜๋ฆฌ ๋ฐ์ดํ„ฐ๋ฅผ ์ฝ๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€
  2. Stage 2 ์ดํ›„: Stage 3์ด ๋ถˆ์™„์ „ํ•œ ๋ธ”๋Ÿฌ ๊ฒฐ๊ณผ๋ฅผ ์ฝ๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€
  3. Stage 3 ์ดํ›„: ๋ธ”๋ก ์ข…๋ฃŒ ์ „ ๋ชจ๋“  ์ถœ๋ ฅ ์“ฐ๊ธฐ๊ฐ€ ์™„๋ฃŒ๋˜๋„๋ก ๋ณด์žฅ

์Šค๋ ˆ๋“œ ํ™œ์šฉ ํŒจํ„ด

  • Stage 1: 50% ํ™œ์šฉ (256๊ฐœ ์ค‘ 128๊ฐœ ์Šค๋ ˆ๋“œ ํ™œ์„ฑ, 128๊ฐœ ์œ ํœด)
  • Stage 2: 50% ํ™œ์šฉ (128๊ฐœ ํ™œ์„ฑ, 128๊ฐœ ์œ ํœด)
  • Stage 3: 100% ํ™œ์šฉ (์ „์ฒด 256๊ฐœ ์Šค๋ ˆ๋“œ ํ™œ์„ฑ)

์ด๊ฒƒ์€ ์„œ๋กœ ๋‹ค๋ฅธ ์Šค๋ ˆ๋“œ ๊ทธ๋ฃน์ด ์กฐ์œจ๋œ ํŒŒ์ดํ”„๋ผ์ธ ๋‚ด์—์„œ ์„œ๋กœ ๋‹ค๋ฅธ ์—ฐ์‚ฐ ์ž‘์—…์— ํŠนํ™”๋˜๋Š” ์ •๊ตํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ๋ฅผ ๋ณด์—ฌ์ฃผ๋ฉฐ, ๋‹จ์ˆœํ•œ ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ๋ฅผ ๋„˜์–ด ์‹ค์ œ GPU ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ํ•„์š”ํ•œ ์•„ํ‚คํ…์ฒ˜์  ์‚ฌ๊ณ ๋กœ ๋‚˜์•„๊ฐ‘๋‹ˆ๋‹ค.

๋ฉ”๋ชจ๋ฆฌ ๊ณ„์ธต ๊ตฌ์กฐ ์ตœ์ ํ™”

๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ ์•„ํ‚คํ…์ฒ˜:

  • ๋‘ ๊ฐœ์˜ ํŠนํ™”๋œ ๋ฒ„ํผ๊ฐ€ ๋‹จ๊ณ„ ๊ฐ„ ๋ฐ์ดํ„ฐ ํ๋ฆ„์„ ์ฒ˜๋ฆฌ
  • ์ „์—ญ ๋ฉ”๋ชจ๋ฆฌ ์ ‘๊ทผ์€ ๊ฒฝ๊ณ„ ์—ฐ์‚ฐ์—๋งŒ ์ตœ์†Œํ™”
  • ๋ชจ๋“  ์ค‘๊ฐ„ ์ฒ˜๋ฆฌ์— ๋น ๋ฅธ ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ

์ ‘๊ทผ ํŒจํ„ด์˜ ์ด์ :

  • Stage 1: ์ž…๋ ฅ ๋กœ๋”ฉ์„ ์œ„ํ•œ ๋ณ‘ํ•ฉ๋œ ์ „์—ญ ๋ฉ”๋ชจ๋ฆฌ ์ฝ๊ธฐ
  • Stage 2: ๋ธ”๋Ÿฌ ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•œ ๋น ๋ฅธ ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ ์ฝ๊ธฐ
  • Stage 3: ์ถœ๋ ฅ์„ ์œ„ํ•œ ๋ณ‘ํ•ฉ๋œ ์ „์—ญ ๋ฉ”๋ชจ๋ฆฌ ์“ฐ๊ธฐ

์‹ค์ œ ์‘์šฉ ๋ถ„์•ผ

์ด ํŒŒ์ดํ”„๋ผ์ธ ์•„ํ‚คํ…์ฒ˜ ํŒจํ„ด์€ ๋‹ค์Œ ๋ถ„์•ผ์˜ ๊ธฐ๋ฐ˜์ด ๋ฉ๋‹ˆ๋‹ค:

์ด๋ฏธ์ง€ ์ฒ˜๋ฆฌ ํŒŒ์ดํ”„๋ผ์ธ:

  • ๋‹ค๋‹จ๊ณ„ ํ•„ํ„ฐ (๋ธ”๋Ÿฌ, ์„ ๋ช…ํ™”, ์—ฃ์ง€ ๊ฒ€์ถœ์„ ์ˆœ์ฐจ์ ์œผ๋กœ)
  • ์ƒ‰ ๊ณต๊ฐ„ ๋ณ€ํ™˜ (RGB โ†’ HSV โ†’ ์ฒ˜๋ฆฌ โ†’ RGB)
  • ๋‹ค์ค‘ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ํŒจ์Šค๋ฅผ ์‚ฌ์šฉํ•œ ๋…ธ์ด์ฆˆ ๊ฐ์†Œ

๊ณผํ•™ ์—ฐ์‚ฐ:

  • ๋‹ค๋‹จ๊ณ„ ์œ ํ•œ ์ฐจ๋ถ„ ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•œ ์Šคํ…์‹ค ์—ฐ์‚ฐ
  • ํ•„ํ„ฐ๋ง, ๋ณ€ํ™˜, ๋ถ„์„ ํŒŒ์ดํ”„๋ผ์ธ์„ ์‚ฌ์šฉํ•œ ์‹ ํ˜ธ ์ฒ˜๋ฆฌ
  • ๋‹ค๋‹จ๊ณ„ ์†”๋ฒ„ ๋ฐ˜๋ณต์„ ์‚ฌ์šฉํ•œ ์ „์‚ฐ ์œ ์ฒด ์—ญํ•™

๋จธ์‹ ๋Ÿฌ๋‹:

  • ์„œ๋กœ ๋‹ค๋ฅธ ์—ฐ์‚ฐ์„ ์œ„ํ•ด ํŠนํ™”๋œ ์Šค๋ ˆ๋“œ ๊ทธ๋ฃน์„ ๊ฐ€์ง„ ์‹ ๊ฒฝ๋ง ๋ ˆ์ด์–ด
  • ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ ํŒŒ์ดํ”„๋ผ์ธ (์กฐ์œจ๋œ ๋‹จ๊ณ„์—์„œ ๋กœ๋“œ, ์ •๊ทœํ™”, ์ฆ๊ฐ•)
  • ์„œ๋กœ ๋‹ค๋ฅธ ์Šค๋ ˆ๋“œ ๊ทธ๋ฃน์ด ์„œ๋กœ ๋‹ค๋ฅธ ์—ฐ์‚ฐ์„ ์ฒ˜๋ฆฌํ•˜๋Š” ๋ฐฐ์น˜ ์ฒ˜๋ฆฌ

ํ•ต์‹ฌ ๊ธฐ์ˆ ์  ํ†ต์ฐฐ

์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ vs. ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ:

  • ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ: ์Šค๋ ˆ๋“œ๊ฐ€ ์„œ๋กœ ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ ์š”์†Œ์— ๋™์ผํ•œ ์ฝ”๋“œ๋ฅผ ์‹คํ–‰
  • ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ: ์Šค๋ ˆ๋“œ๊ฐ€ ํŠนํ™”๋œ ์—ญํ• ์— ๋”ฐ๋ผ ๊ทผ๋ณธ์ ์œผ๋กœ ๋‹ค๋ฅธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‹คํ–‰

๋ฐฐ๋ฆฌ์–ด ์‚ฌ์šฉ ์ฒ ํ•™:

  • ์ „๋žต์  ๋ฐฐ์น˜: ์˜์กด์ ์ธ ๋‹จ๊ณ„ ๊ฐ„์˜ ๊ฒฝ์Ÿ ์ƒํƒœ๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ํ•„์š”ํ•œ ๊ณณ์—๋งŒ ๋ฐฐ๋ฆฌ์–ด ๋ฐฐ์น˜
  • ์„ฑ๋Šฅ ๊ณ ๋ ค์‚ฌํ•ญ: ๊ฐ ๋ฐฐ๋ฆฌ์–ด์—๋Š” ๋™๊ธฐํ™” ์˜ค๋ฒ„ํ—ค๋“œ๊ฐ€ ๋ฐœ์ƒํ•˜๋ฏ€๋กœ ์ •ํ™•ํ•˜์ง€๋งŒ ์ ˆ์ œ๋œ ์‚ฌ์šฉ
  • ์ •ํ™•์„ฑ ๋ณด์žฅ: ์ ์ ˆํ•œ ๋ฐฐ๋ฆฌ์–ด ๋ฐฐ์น˜๋กœ ์Šค๋ ˆ๋“œ ์‹คํ–‰ ํƒ€์ด๋ฐ์— ๊ด€๊ณ„์—†์ด ๊ฒฐ์ •์  ๊ฒฐ๊ณผ๋ฅผ ๋ณด์žฅ

์Šค๋ ˆ๋“œ ํŠนํ™”์˜ ์ด์ :

  • ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ตœ์ ํ™”: ๊ฐ ๋‹จ๊ณ„๋ฅผ ํ•ด๋‹น ์—ฐ์‚ฐ ํŒจํ„ด์— ๋งž๊ฒŒ ์ตœ์ ํ™” ๊ฐ€๋Šฅ
  • ๋ฉ”๋ชจ๋ฆฌ ์ ‘๊ทผ ์ตœ์ ํ™”: ์„œ๋กœ ๋‹ค๋ฅธ ๋‹จ๊ณ„์—์„œ ์„œ๋กœ ๋‹ค๋ฅธ ๋ฉ”๋ชจ๋ฆฌ ์ ‘๊ทผ ์ „๋žต ์‚ฌ์šฉ ๊ฐ€๋Šฅ
  • ๋ฆฌ์†Œ์Šค ํ™œ์šฉ: ๋ณต์žกํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ํŠนํ™”๋˜๊ณ  ํšจ์œจ์ ์ธ ๊ตฌ์„ฑ ์š”์†Œ๋กœ ๋ถ„ํ•ด ๊ฐ€๋Šฅ

์ด ์†”๋ฃจ์…˜์€ ๋ณต์žกํ•œ ๋‹ค๋‹จ๊ณ„ ์—ฐ์‚ฐ์„ ์œ„ํ•ด ์Šค๋ ˆ๋“œ ํŠนํ™”์™€ ์ „๋žต์  ๋™๊ธฐํ™”๋ฅผ ํ™œ์šฉํ•˜๋Š” ์ •๊ตํ•œ GPU ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์„ค๊ณ„ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ฃผ๋ฉฐ, ๋‹จ์ˆœํ•œ ๋ณ‘๋ ฌ ๋ฃจํ”„๋ฅผ ๋„˜์–ด ์‹ค์ œ GPU ์†Œํ”„ํŠธ์›จ์–ด์—์„œ ์‚ฌ์šฉ๋˜๋Š” ์•„ํ‚คํ…์ฒ˜์  ์ ‘๊ทผ ๋ฐฉ์‹์œผ๋กœ ๋‚˜์•„๊ฐ‘๋‹ˆ๋‹ค.