Puzzle 13: 1D ํ•ฉ์„ฑ๊ณฑ

LayoutTensor๋กœ ์ „ํ™˜ํ•˜๊ธฐ

์ง€๊ธˆ๊นŒ์ง€ GPU ํผ์ฆ ์—ฌ์ •์—์„œ GPU ๋ฉ”๋ชจ๋ฆฌ ๊ด€๋ฆฌ์— ๋Œ€ํ•œ ๋‘ ๊ฐ€์ง€ ์ ‘๊ทผ ๋ฐฉ์‹์„ ํ•จ๊ป˜ ์‚ดํŽด๋ณด์•˜์Šต๋‹ˆ๋‹ค:

  1. UnsafePointer๋ฅผ ์‚ฌ์šฉํ•œ ํฌ์ธํ„ฐ ์ง์ ‘ ์กฐ์ž‘ ๋ฐฉ์‹์˜ raw ๋ฉ”๋ชจ๋ฆฌ ๊ด€๋ฆฌ
  2. ๊ฐ•๋ ฅํ•œ address_space ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ํ• ๋‹นํ•˜๋Š”, ๋ณด๋‹ค ๊ตฌ์กฐํ™”๋œ LayoutTensor

์ด ํผ์ฆ๋ถ€ํ„ฐ๋Š” LayoutTensor๋กœ ์™„์ „ํžˆ ์ „ํ™˜ํ•ฉ๋‹ˆ๋‹ค. ์ด ์ถ”์ƒํ™”๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ด์ ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค:

  • ํƒ€์ž… ์•ˆ์ „ํ•œ ๋ฉ”๋ชจ๋ฆฌ ์ ‘๊ทผ ํŒจํ„ด
  • ๋ฐ์ดํ„ฐ ๋ ˆ์ด์•„์›ƒ์˜ ๋ช…ํ™•ํ•œ ํ‘œํ˜„
  • ์ฝ”๋“œ ์œ ์ง€๋ณด์ˆ˜์„ฑ ํ–ฅ์ƒ
  • ๋ฉ”๋ชจ๋ฆฌ ๊ด€๋ จ ๋ฒ„๊ทธ ๋ฐœ์ƒ ๊ฐ€๋Šฅ์„ฑ ๊ฐ์†Œ
  • ๋‚ด๋ถ€ ์—ฐ์‚ฐ์˜ ์˜๋„๋ฅผ ๋” ์ž˜ ๋“œ๋Ÿฌ๋‚ด๋Š” ํ‘œํ˜„๋ ฅ ์žˆ๋Š” ์ฝ”๋“œ
  • ์•ž์œผ๋กœ ์ฐจ์ฐจ ์•Œ์•„๊ฐˆ ๋” ๋งŽ์€ ๊ฒƒ๋“ค!

์ด๋Ÿฌํ•œ ์ „ํ™˜์€ Mojo ๐Ÿ”ฅ์˜ ํ˜„๋Œ€์  GPU ํ”„๋กœ๊ทธ๋ž˜๋ฐ ๋ชจ๋ฒ” ์‚ฌ๋ก€์™€ ๋งž๋‹ฟ์•„ ์žˆ์Šต๋‹ˆ๋‹ค. ๋†’์€ ์ˆ˜์ค€์˜ ์ถ”์ƒํ™”๋กœ ๋ณต์žก์„ฑ์„ ๊ด€๋ฆฌํ•˜๋ฉด์„œ๋„ ์„ฑ๋Šฅ์€ ๊ทธ๋Œ€๋กœ ์œ ์ง€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ฐœ์š”

์‹ ํ˜ธ ์ฒ˜๋ฆฌ์™€ ์ด๋ฏธ์ง€ ๋ถ„์„์—์„œ ํ•ฉ์„ฑ๊ณฑ(convolution)์€ ๋‘ ์‹œํ€€์Šค๋ฅผ ๊ฒฐํ•ฉํ•ด ์ƒˆ๋กœ์šด ์‹œํ€€์Šค๋ฅผ ๋งŒ๋“ค์–ด๋‚ด๋Š” ํ•ต์‹ฌ ์—ฐ์‚ฐ์ž…๋‹ˆ๋‹ค. ์ด ํผ์ฆ์—์„œ๋Š” ์ž…๋ ฅ ๋ฐฐ์—ด ์œ„๋กœ ์ปค๋„์„ ์Šฌ๋ผ์ด๋”ฉํ•˜๋ฉด์„œ ๊ฐ ์ถœ๋ ฅ ์›์†Œ๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” 1D ํ•ฉ์„ฑ๊ณฑ์„ GPU์—์„œ ๊ตฌํ˜„ํ•ด ๋ด…๋‹ˆ๋‹ค.

LayoutTensor ์ถ”์ƒํ™”๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฒกํ„ฐ a์™€ ๋ฒกํ„ฐ b์˜ 1D ํ•ฉ์„ฑ๊ณฑ์„ ๊ณ„์‚ฐํ•˜๊ณ , ๊ฒฐ๊ณผ๋ฅผ output์— ์ €์žฅํ•˜๋Š” ์ปค๋„์„ ๊ตฌํ˜„ํ•˜์„ธ์š”.

์ฐธ๊ณ : ์ผ๋ฐ˜์ ์ธ ๊ฒฝ์šฐ๋ฅผ ์ฒ˜๋ฆฌํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์Šค๋ ˆ๋“œ๋‹น ์ „์—ญ ์ฝ๊ธฐ 2ํšŒ, ์ „์—ญ ์“ฐ๊ธฐ 1ํšŒ๋งŒ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

1D ํ•ฉ์„ฑ๊ณฑ ์‹œ๊ฐํ™” 1D ํ•ฉ์„ฑ๊ณฑ ์‹œ๊ฐํ™”

ํ•ฉ์„ฑ๊ณฑ์ด ์ฒ˜์Œ์ด๋ผ๋ฉด, ๊ฐ€์ค‘์น˜๊ฐ€ ์ ์šฉ๋œ ์Šฌ๋ผ์ด๋”ฉ ์œˆ๋„์šฐ ์—ฐ์‚ฐ์ด๋ผ๊ณ  ์ƒ๊ฐํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค. ๊ฐ ์œ„์น˜์—์„œ ์ปค๋„ ๊ฐ’๊ณผ ๋Œ€์‘ํ•˜๋Š” ์ž…๋ ฅ ๊ฐ’์„ ๊ณฑํ•œ ๋’ค ํ•ฉ์‚ฐํ•ฉ๋‹ˆ๋‹ค. ์ˆ˜ํ•™์  ํ‘œ๊ธฐ๋กœ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

\[\Large output[i] = \sum_{j=0}^{\text{CONV}-1} a[i+j] \cdot b[j] \]

์˜์‚ฌ ์ฝ”๋“œ๋กœ ํ‘œํ˜„ํ•œ 1D ํ•ฉ์„ฑ๊ณฑ:

for i in range(SIZE):
    for j in range(CONV):
        if i + j < SIZE:
            ret[i] += a_host[i + j] * b_host[j]

์ด ํผ์ฆ์€ ๋‹จ๊ณ„์ ์œผ๋กœ ์ดํ•ด๋ฅผ ์Œ“์•„๊ฐˆ ์ˆ˜ ์žˆ๋„๋ก ๋‘ ํŒŒํŠธ๋กœ ๋‚˜๋‰ฉ๋‹ˆ๋‹ค:

  • ๐Ÿ”ฐ ๊ธฐ๋ณธ ๋ฒ„์ „ ์—ฌ๊ธฐ์„œ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•˜์„ธ์š”. ๋‹จ์ผ ๋ธ”๋ก์—์„œ LayoutTensor์™€ ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ํ™œ์šฉํ•œ ํ•ฉ์„ฑ๊ณฑ ๊ตฌํ˜„์˜ ๊ธฐ์ดˆ๋ฅผ ์ตํž™๋‹ˆ๋‹ค.

  • โญ ๋ธ”๋ก ๊ฒฝ๊ณ„ ๋ฒ„์ „ ์ด์–ด์„œ ๋ธ”๋ก ๊ฒฝ๊ณ„๋ฅผ ๋„˜์–ด ๋ฐ์ดํ„ฐ๋ฅผ ๊ณต์œ ํ•ด์•ผ ํ•˜๋Š” ๋” ๊นŒ๋‹ค๋กœ์šด ๊ฒฝ์šฐ์— ๋„์ „ํ•ฉ๋‹ˆ๋‹ค. LayoutTensor์˜ ๊ธฐ๋Šฅ์„ ๋ณธ๊ฒฉ์ ์œผ๋กœ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.

๊ฐ ๋ฒ„์ „์€ ๋ฉ”๋ชจ๋ฆฌ ์ ‘๊ทผ ํŒจํ„ด๊ณผ ์Šค๋ ˆ๋“œ ๊ฐ„ ํ˜‘๋ ฅ ์ธก๋ฉด์—์„œ ์„œ๋กœ ๋‹ค๋ฅธ ๋„์ „ ๊ณผ์ œ๋ฅผ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ๋ณธ ๋ฒ„์ „์—์„œ ํ•ฉ์„ฑ๊ณฑ ์—ฐ์‚ฐ์˜ ์›๋ฆฌ๋ฅผ ์ตํžŒ ๋‹ค์Œ, ๋ธ”๋ก ๊ฒฝ๊ณ„ ๋ฒ„์ „์—์„œ๋Š” ์‹ค์ œ GPU ํ”„๋กœ๊ทธ๋ž˜๋ฐ์—์„œ ๋งˆ์ฃผ์น˜๋Š” ๋ณต์žกํ•œ ์ƒํ™ฉ์„ ๋‹ค๋ฃจ๋Š” ๋Šฅ๋ ฅ์„ ์‹œํ—˜ํ•ด ๋ด…๋‹ˆ๋‹ค.