๋ณด๋„ˆ์Šค ์ฑŒ๋ฆฐ์ง€

์ฑŒ๋ฆฐ์ง€ I: ๊ณ ๊ธ‰ ์†Œํ”„ํŠธ๋งฅ์Šค ๊ตฌํ˜„

์ด ์ฑŒ๋ฆฐ์ง€๋Š” Puzzle 18: ์†Œํ”„ํŠธ๋งฅ์Šค Op์˜ ํ™•์žฅ์ž…๋‹ˆ๋‹ค

์†Œํ”„ํŠธ๋งฅ์Šค ๊ตฌํ˜„์„ ํ™•์žฅํ•˜๋Š” ๊ณ ๊ธ‰ ์ฑŒ๋ฆฐ์ง€๋“ค์ž…๋‹ˆ๋‹ค:

1. ๋Œ€๊ทœ๋ชจ ์†Œํ”„ํŠธ๋งฅ์Šค: TPB < SIZE ์ฒ˜๋ฆฌ

์ž…๋ ฅ ํฌ๊ธฐ๊ฐ€ ๋ธ”๋ก๋‹น ์Šค๋ ˆ๋“œ ์ˆ˜๋ฅผ ์ดˆ๊ณผํ•˜๋ฉด(TPB < SIZE), ๋‹จ์ผ ๋ธ”๋ก์ด ์ „์ฒด ๋ฐฐ์—ด์„ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์—†์–ด ํ˜„์žฌ ๊ตฌํ˜„์ด ๋™์ž‘ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๋‘ ๊ฐ€์ง€ ์ ‘๊ทผ๋ฒ•์ด ์žˆ์Šต๋‹ˆ๋‹ค:

1.1 ๋ฒ„ํผ ๋ฆฌ๋•์…˜

  • ๋ธ”๋ก ๋‹จ์œ„ ๊ฒฐ๊ณผ(์ตœ๋Œ“๊ฐ’๊ณผ ํ•ฉ๊ณ„)๋ฅผ ๋””๋ฐ”์ด์Šค ๋ฉ”๋ชจ๋ฆฌ์— ์ €์žฅํ•ฉ๋‹ˆ๋‹ค
  • ๋‘ ๋ฒˆ์งธ ์ปค๋„์„ ์‚ฌ์šฉํ•˜์—ฌ ์ด ์ค‘๊ฐ„ ๊ฒฐ๊ณผ๋“ค์— ๋Œ€ํ•ด ๋ฆฌ๋•์…˜์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค
  • ์ „์—ญ ์ตœ๋Œ“๊ฐ’๊ณผ ํ•ฉ๊ณ„๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์ตœ์ข… ์ •๊ทœํ™” ๋‹จ๊ณ„๋ฅผ ๊ตฌํ˜„ํ•ฉ๋‹ˆ๋‹ค

1.2 2๋‹จ๊ณ„ ์†Œํ”„ํŠธ๋งฅ์Šค

  • 1์ฐจ: ๊ฐ ๋ธ”๋ก์ด ๋กœ์ปฌ ์ตœ๋Œ“๊ฐ’์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค
  • ๋™๊ธฐํ™” ํ›„ ์ „์—ญ ์ตœ๋Œ“๊ฐ’์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค
  • 2์ฐจ: \(e^{x-max}\)์™€ ๋กœ์ปฌ ํ•ฉ๊ณ„๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค
  • ๋™๊ธฐํ™” ํ›„ ์ „์—ญ ํ•ฉ๊ณ„๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค
  • ์ตœ์ข…: ์ „์—ญ ํ•ฉ๊ณ„๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ •๊ทœํ™”ํ•ฉ๋‹ˆ๋‹ค

2. ๋ฐฐ์น˜ ์†Œํ”„ํŠธ๋งฅ์Šค

๋ฒกํ„ฐ ๋ฐฐ์น˜(2D ์ž…๋ ฅ ํ…์„œ)์— ๋Œ€ํ•œ ์†Œํ”„ํŠธ๋งฅ์Šค๋ฅผ ๋‹ค์Œ ๋ณ€ํ˜•์œผ๋กœ ๊ตฌํ˜„ํ•ฉ๋‹ˆ๋‹ค:

  • ํ–‰ ๋‹จ์œ„ ์†Œํ”„ํŠธ๋งฅ์Šค: ๊ฐ ํ–‰์— ๋…๋ฆฝ์ ์œผ๋กœ ์†Œํ”„ํŠธ๋งฅ์Šค๋ฅผ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค
  • ์—ด ๋‹จ์œ„ ์†Œํ”„ํŠธ๋งฅ์Šค: ๊ฐ ์—ด์— ๋…๋ฆฝ์ ์œผ๋กœ ์†Œํ”„ํŠธ๋งฅ์Šค๋ฅผ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค
  • ๋‘ ๊ตฌํ˜„ ๊ฐ„์˜ ์„ฑ๋Šฅ ์ฐจ์ด๋ฅผ ๋น„๊ตํ•ฉ๋‹ˆ๋‹ค

์ฑŒ๋ฆฐ์ง€ II: ๊ณ ๊ธ‰ ์–ดํ…์…˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜

์ด ์ฑŒ๋ฆฐ์ง€๋Š” Puzzle 19: ์–ดํ…์…˜ Op์˜ ํ™•์žฅ์ž…๋‹ˆ๋‹ค

๋ฒกํ„ฐ ์–ดํ…์…˜ ๊ตฌํ˜„์„ ๊ธฐ๋ฐ˜์œผ๋กœ, ์–ดํ…์…˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜์˜ ํ•œ๊ณ„๋ฅผ ๋„“ํ˜€๋ณด๋Š” ๊ณ ๊ธ‰ ์ฑŒ๋ฆฐ์ง€๋“ค์ž…๋‹ˆ๋‹ค:

1. ๋” ๊ธด ์‹œํ€€์Šค ๊ธธ์ด

๊ธฐ์กด ์ปค๋„์„ ์‚ฌ์šฉํ•˜์—ฌ ๋” ๊ธด ์‹œํ€€์Šค๋ฅผ ์ฒ˜๋ฆฌํ•˜๋„๋ก ์–ดํ…์…˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ํ™•์žฅํ•ฉ๋‹ˆ๋‹ค:

1.1 ์‹œํ€€์Šค ๊ธธ์ด ํ™•์žฅ

  • SEQ_LEN = 32์™€ SEQ_LEN = 64๋ฅผ ์ฒ˜๋ฆฌํ•˜๋„๋ก ์–ดํ…์…˜ ๊ตฌํ˜„์„ ์ˆ˜์ •ํ•ฉ๋‹ˆ๋‹ค
  • TPB(๋ธ”๋ก๋‹น ์Šค๋ ˆ๋“œ ์ˆ˜) ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ทธ์— ๋งž๊ฒŒ ์—…๋ฐ์ดํŠธํ•ฉ๋‹ˆ๋‹ค
  • ์ „์น˜ ์ปค๋„์ด ๋” ํฐ ํ–‰๋ ฌ ํฌ๊ธฐ๋ฅผ ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ์ฒ˜๋ฆฌํ•˜๋Š”์ง€ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค

1.2 ๋™์  ์‹œํ€€์Šค ๊ธธ์ด

  • ๋Ÿฐํƒ€์ž„์— ๊ฐ€๋ณ€ ์‹œํ€€์Šค ๊ธธ์ด๋ฅผ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋Š” ์–ดํ…์…˜์„ ๊ตฌํ˜„ํ•ฉ๋‹ˆ๋‹ค
  • SEQ_LEN๋ณด๋‹ค ์งง์€ ์‹œํ€€์Šค๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด ์ปค๋„์— ๊ฒฝ๊ณ„ ๊ฒ€์‚ฌ๋ฅผ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค
  • ๊ณ ์ • ์‹œํ€€์Šค ๊ธธ์ด ์ฒ˜๋ฆฌ์™€ ๋™์  ์‹œํ€€์Šค ๊ธธ์ด ์ฒ˜๋ฆฌ์˜ ์„ฑ๋Šฅ์„ ๋น„๊ตํ•ฉ๋‹ˆ๋‹ค

2. ๋ฐฐ์น˜ ๋ฒกํ„ฐ ์–ดํ…์…˜

์—ฌ๋Ÿฌ ์–ดํ…์…˜ ์—ฐ์‚ฐ์„ ๋™์‹œ์— ์ฒ˜๋ฆฌํ•˜๋„๋ก ํ™•์žฅํ•ฉ๋‹ˆ๋‹ค:

2.1 ๋ฐฐ์น˜ ์ฒ˜๋ฆฌ

  • ์—ฌ๋Ÿฌ ์ฟผ๋ฆฌ ๋ฒกํ„ฐ๋ฅผ ํ•œ ๋ฒˆ์— ์ฒ˜๋ฆฌํ•˜๋„๋ก ์–ดํ…์…˜ ์—ฐ์‚ฐ์„ ์ˆ˜์ •ํ•ฉ๋‹ˆ๋‹ค
  • ์ž…๋ ฅ ํ˜•ํƒœ: Q(batch_size, d), K(seq_len, d), V(seq_len, d)
  • ์ถœ๋ ฅ ํ˜•ํƒœ: (batch_size, d)
  • ์ ์ ˆํ•œ ์ธ๋ฑ์‹ฑ์œผ๋กœ ๊ธฐ์กด ์ปค๋„์„ ์žฌ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค

2.2 ๋ฐฐ์น˜๋ฅผ ์œ„ํ•œ ๋ฉ”๋ชจ๋ฆฌ ์ตœ์ ํ™”

  • ๋ฐฐ์น˜ ์š”์†Œ ๊ฐ„ ๋ฒ„ํผ๋ฅผ ์žฌ์‚ฌ์šฉํ•˜์—ฌ ๋ฉ”๋ชจ๋ฆฌ ํ• ๋‹น์„ ์ตœ์†Œํ™”ํ•ฉ๋‹ˆ๋‹ค
  • ๋‹ค์–‘ํ•œ ๋ฐฐ์น˜ ํฌ๊ธฐ(2, 4, 8)์—์„œ ์„ฑ๋Šฅ์„ ๋น„๊ตํ•ฉ๋‹ˆ๋‹ค
  • ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ ํŒจํ„ด์„ ๋ถ„์„ํ•ฉ๋‹ˆ๋‹ค