Home [Week 04 / Day 015] ํ•™์Šต ๊ธฐ๋ก
Post
Cancel

[Week 04 / Day 015] ํ•™์Šต ๊ธฐ๋ก

๐Ÿ“ ์˜ค๋Š˜ ํ•œ ์ผ

  • ๊ฐ•์˜ ์ˆ˜๊ฐ• ๋ฐ ํ€ด์ฆˆ
    • [DL Basic] 7, 8๊ฐ• ์‹ค์Šต
    • [DL Basic] RNN ํ€ด์ฆˆ
  • ๊ธฐ๋ณธ ๊ณผ์ œ ์™„๋ฃŒ
  • ๋ฉ˜ํ† ๋ง (๋…ผ๋ฌธ ์Šคํ„ฐ๋””)
  • ์ฝ”๋”ฉํ…Œ์ŠคํŠธ ๋ฌธ์ œ ํ’€๊ธฐ


๐Ÿ‘ฅ ํ”ผ์–ด์„ธ์…˜ ์š”์•ฝ

ํ‚ค์›Œ๋“œ ๊ณต์œ 

  • attention
  • transformer์˜ ์„ธ ๊ฐ€์ง€ multi-head attention
  • LSTM์˜ ์žฅ์ 
    • Vanilla RNN
      • backward pass์—์„œ weight matrix W๊ฐ€ ๊ณ„์†ํ•ด์„œ ๊ณฑํ•ด์ ธ์„œ vanishing & exploding gradients ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋‹ค.
      • gradient clipping๊ณผ ๊ฐ™์€ ๋ฐฉ๋ฒ•์œผ๋กœ ๊ฐœ์„ ํ•  ์ˆ˜ ์žˆ์œผ๋‚˜, ํ•œ๊ณ„๊ฐ€ ์žˆ๋‹ค.
    • LSTM
      • cell state๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ์‹๊ณผ hidden state๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ์‹ ๋ชจ๋‘ element-wise multiplication์„ ์‚ฌ์šฉํ•œ๋‹ค.
      • cell state์˜ backprop์ด ๊ทธ์ € (upstream gradient) * (forget gate)๊ฐ€ ๋œ๋‹ค.
        • ์ด๋ ‡๊ฒŒ ๊ณฑํ•ด์ง€๋Š” forget gate๋Š” ๋งค step ๋งˆ๋‹ค ๋ฐ”๋€Œ๊ธฐ ๋•Œ๋ฌธ์—, ๋™์ผํ•œ weight matrix๊ฐ€ ๊ณฑํ•ด์ ธ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ–ˆ๋˜ Vanilla RNN์— ๋น„ํ•ด ๋‚ซ๋‹ค.
        • ๋˜ํ•œ, forget gate๋Š” Sigmoid ๊ฐ’์œผ๋กœ element-wise multiplication์ด 0 or 1 ๊ฐ’์ด๊ธฐ ๋•Œ๋ฌธ์—, ์ด๋ฅผ ๋ฐ˜๋ณต์ ์œผ๋กœ ๊ณฑํ–ˆ์„ ๋•Œ ๋” ์ข‹์€ ์ˆ˜์น˜์  ํŠน์„ฑ์„ ๋ณด์ผ ์ˆ˜ ์žˆ๋‹ค.
      • tanh๋ฅผ ๋งค step๋งˆ๋‹ค ๊ฑฐ์น˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, ๋‹จ ํ•œ ๋ฒˆ๋งŒ ๊ฑฐ์น˜๋ฉด ๋œ๋‹ค.
        • Vanilla RNN์˜ backward pass์—์„œ๋Š” ๋งค step์—์„œ gradient๊ฐ€ tanh๋ฅผ ๊ฑฐ์ณ์•ผ ํ–ˆ๋‹ค.

      gradient๊ฐ€ ๋ชจ๋ธ์˜ ๋์ธ loss์—์„œ cell state $c_0$๊นŒ์ง€ ํ˜๋Ÿฌ๊ฐ€๋ฉด์„œ ๋ฐฉํ•ด๋ฅผ ๋œ ๋ฐ›๋Š”๋‹ค.
      โ‡’ ResNet์˜ skip connection == LSTM์˜ element-wise multiplication

      backprop through cell state๋Š” gradient๋ฅผ ์œ„ํ•œ ๊ณ ์†๋„๋กœ

      backprop through cell state๋Š” gradient๋ฅผ ์œ„ํ•œ ๊ณ ์†๋„๋กœ

  • (๊ฐ•์˜ ์ถ”์ฒœ) AutoEncoder์˜ ๋ชจ๋“ ๊ฒƒ

์ฝ”๋”ฉํ…Œ์ŠคํŠธ ์Šคํ„ฐ๋””

  • [BOJ] ์›์ˆญ์ด ๋งค๋‹ฌ๊ธฐ


๐Ÿซ ๋ฉ˜ํ† ๋ง ์š”์•ฝ

์ˆ˜์—… ๊ด€๋ จ

  1. MHA์˜ ๊ฐ attention head๊ฐ€ ๊ฐ๊ธฐ ๋‹ค๋ฅธ ์š”์†Œ์— ๋Œ€ํ•ด attention์„ ๊ฐ–๋Š” ์›๋ฆฌ๊ฐ€ ๋ฌด์—‡์ธ์ง€ ๊ถ๊ธˆํ•˜๋‹ค.
    • ๋”ฐ๋กœ ์›๋ฆฌ๊ฐ€ ์žˆ๋‹ค๊ธฐ๋ณด๋‹ค๋Š”, ์‹ค์ œ๋กœ ํ•ด๋ณด๋‹ˆ๊นŒ ์ž˜ ๋œ ๊ฒƒ์ด๋ผ๊ณ  ์ƒ๊ฐํ•˜๋ฉด ๋œ๋‹ค.
    • AI ๋ถ„์•ผ์—์„œ๋Š” ์ด๋Ÿฐ ๋Š๋‚Œ์ธ ๊ฒŒ ์€๊ทผํžˆ ๋งŽ๋‹ค.
  2. Positional encoding
    • ์ด๊ฒƒ ๋˜ํ•œ ํฐ ์˜๋ฏธ๊ฐ€ ์—†๋‹ค. ๊ทธ๋ƒฅ ์—ฌ๋Ÿฌ position์„ ๋‹ค๋ฅด๊ฒŒ encoding ํ•˜๊ธฐ ์œ„ํ•œ ๊ฒƒ์ด๋ผ๊ณ  ์ƒ๊ฐํ•˜๋ฉด ๋œ๋‹ค.
    • ์ด๊ฒƒ ์ž์ฒด๋„ ํ•™์Šต์ด ๊ฐ€๋Šฅํ•˜๋‹ค. (trainable)
    • transformer๋Š” input ๊ธธ์ด๊ฐ€ ๊ณ ์ •๋œ ๊ฐ’์ด๋‹ค. ๋Œ€์‹  ์—„์ฒญ ํฌ๋‹ค.
  3. ResNet์—์„œ skip connection์œผ๋กœ residual๋งŒ ํ•™์Šต์‹œํ‚ค๋Š” ๊ฒƒ์˜ ์›๋ฆฌ
    • ์—ฐ๊ตฌ์ž๋“ค์ด โ€˜์ž˜ ๋  ๊ฒƒ ๊ฐ™์•„์„œโ€™ ์‹œ๋„ํ•ด ๋ณธ ๊ฒƒ์ด ์‹ค์ œ๋กœ ์ž˜ ๋œ ์ผ€์ด์Šค์ด๋‹ค.
    • ํ›„์† ๋…ผ๋ฌธ๋“ค์—์„œ ์™œ ์ž˜ ๋˜๋Š”์ง€์— ๋Œ€ํ•œ ์—ฐ๊ตฌ๊ฐ€ ์ด๋ค„์ง€๊ธฐ๋„ ํ•œ๋‹ค.
    • ResNet ๊ณ„์—ด์˜ ๋ชจ๋ธ์€ ๋ฒ”์šฉ์„ฑ์ด ์ข‹๋‹ค.
      • ResNet, ResNext, Res2Net, ResNest
  4. MixUp, CutMix ๋“ฑ์˜ data augmentation ๋ฐฉ๋ฒ•๋ก 
    • ๊ฐ ๋ฐฉ๋ฒ•์ด ํšจ๊ณผ์ ์ธ ๋ฌธ์ œ์˜ ์œ ํ˜• ์ •๋„๋งŒ ์•Œ๊ณ  ์žˆ์–ด๋„ ์ถฉ๋ถ„ํ•˜๋‹ค.
  5. SGD์™€ Adam
    • SGD - ์ดˆ๊ธฐ lr ๊ฐ’์— ๋„ˆ๋ฌด ๋ฏผ๊ฐํ•˜๋‹ค.
      • ์ตœ์ ์˜ lr ๊ฐ’์„ ์ฐพ๋Š” ๊ฒƒ์ด ๊ด€๊ฑด์ด๋‹ค.
      • local minimum์— ๋„๋‹ฌํ•˜๋”๋ผ๋„ flatter ํ•œ ๊ฒƒ์— ๋„๋‹ฌํ•˜๊ฒŒ ๋œ๋‹ค. train๊ณผ dest ์ฐจ์ด๊ฐ€ ์ ์œผ๋ฏ€๋กœ test accr๊ฐ€ ๋” ์ข‹๋‹ค.
    • Adam - lr์„ ๋œ ์„ฌ์„ธํ•˜๊ฒŒ ์žก์•„๋„ ์›ฌ๋งŒํผ good
      • adaptive โ‡’ ์ˆ˜๋ ด์†๋„ faster
      • ํ•˜์ง€๋งŒ ์—ฌ์ „ํžˆ ์ตœ์ ์˜ lr ๊ฐ’์„ ์ฐพ๋Š” ๊ฒƒ์€ ์ค‘์š”ํ•˜๋‹ค. (hyperparameter tuning)
    • ๋‘ ๋ฐฉ๋ฒ•๋งˆ๋‹ค ์ž˜ ๋˜๋Š” task๊ฐ€ ๋‹ค๋ฅด๋‹ค. ์‹ค์ œ๋กœ๋Š” ์—ฌ๋Ÿฌ lr๊ณผ ํ•จ๊ป˜ Adam, SGD ๋“ฑ์˜ ๋ฐฉ๋ฒ•์„ ๋ชจ๋‘ ์กฐํ•ฉํ•ด์„œ ํ•ด๋ณธ๋‹ค.
    • ์Šค์ผ€์ค„๋Ÿฌ๋„ ๋งค์šฐ ์ค‘์š”ํ•˜๋‹ค.
      • ์ดˆ๋ฐ˜์—๋Š” lr์„ ํฌ๊ฒŒ ์ฃผ์–ด์•ผ local minimum์— ๋น ์งˆ ์œ„ํ—˜์„ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค.
    • ๊ฒฝํ—˜์ ์œผ๋กœ ํ•ด์•ผ ํ•œ๋‹ค. (์ด๋ฏธ ์ž˜ ์•Œ๋ ค์ง„ ์ •๋ณด๋“ค์„ ์ด์šฉํ•  ์ค„ ์•Œ์•„์•ผ ํ•œ๋‹ค.)

์‹ฌํ™” ๊ณผ์ œ ๊ด€๋ จ

transfer learning ์‹ฌํ™” ๊ณผ์ œ์—์„œ ์ด๋ฏธ์ง€ ์…‹์„ ์“ด ๊ฒฝ์šฐ์™€ ์•ˆ ์“ด ๊ฒฝ์šฐ์˜ ์ฐจ์ด๊ฐ€ ์ปธ๋Š”๊ฐ€?
โ‡’ ์ฐจ์ด๊ฐ€ ํฌ์ง€ ์•Š์•˜๋‹ค. (transfer learning์˜ ํšจ๊ณผ๊ฐ€ ํฌ์ง€ ์•Š์•˜๋‹ค.)

  • transfer learning์˜ ํšจ๊ณผ๊ฐ€ ํฌ์ง€ ์•Š์•˜๋˜ ์ด์œ 
    1. ๊ธฐ์กด ํ•™์Šต๋œ ๋ฐ์ดํ„ฐ(source data)์™€ ์ถ”๊ฐ€ ๋ฐ์ดํ„ฐ(target data)์˜ domain์ด ๋„ˆ๋ฌด ๋‹ฌ๋ž๋‹ค.
    2. target data๊ฐ€ ์ถฉ๋ถ„ํžˆ ๋งŽ์•˜๋‹ค.
  • transfer learning์ด ํšจ๊ณผ์ ์ด๋ ค๋ฉด?
    1. src data์™€ target data์˜ domain์ด ๋น„์Šทํ•ด์•ผ ํ•œ๋‹ค.
    2. target data๊ฐ€ ์ ์„์ˆ˜๋ก ํšจ๊ณผ์ ์ด๋‹ค.
  • transfer learning์— ๋Œ€ํ•œ ์ด์•ผ๊ธฐ
    • weight์˜ ๊ด€์ ์—์„œ ๋ณด๋ฉด, training์„ ๋งŽ์ด ์‹œํ‚ค๋ฉด ์‹œํ‚ฌ์ˆ˜๋ก ์›๋ž˜ ์ง€์‹์„ ์žƒ์–ด๋ฒ„๋ฆฌ๊ฒŒ ๋œ๋‹ค.
    • ์–ธ์ œ fine-tuning์„ ๋ฉˆ์ถฐ์•ผ ํ•˜๋Š”์ง€, freezing์ด ์ค‘์š”ํ•˜๋‹ค.
  • ํ˜„์—…์—์„œ๋Š” ๋ชจ๋ธ์„ scratch๋ถ€ํ„ฐ ๋งŒ๋“ค๊ธฐ๋ณด๋‹ค๋Š”, ์˜คํžˆ๋ ค transfer learning์„ ๋งŽ์ด ์‚ฌ์šฉํ•œ๋‹ค.
    • ๊ฒฝํ—˜์ ์œผ๋กœ ์ ‘๊ทผํ•˜๋Š” ๊ฒƒ์ด ์ข‹์€ ๊ฒƒ ๊ฐ™๋‹ค.
    • hyperparameter tuning ์‹œ, ์•„์ง์€ Ray์™€ ๊ฐ™์€ ํˆด๋ณด๋‹ค๋Š” ์‚ฌ๋žŒ์ด ๋” ๋‚˜์€ ๊ฒƒ ๊ฐ™๋‹ค.


๐Ÿพ ์ผ์ผ ํšŒ๊ณ 

์ •์‹ ์ด ์—†๋Š” ํ•˜๋ฃจ์˜€๋‹ค. ๊ฐ•์˜์—์„œ๋„ ์ •๋ณด๊ฐ€ ์Ÿ์•„์ง€๊ณ , ํ”ผ์–ด์„ธ์…˜์—์„œ๋„ ์ •๋ณด๊ฐ€ ์Ÿ์•„์ง€๊ณ , ๋ฉ˜ํ† ๋ง์—์„œ๋„ ์ •๋ณด๊ฐ€ ์Ÿ์•„์ง€๋Š” ์‹ ๊ธฐํ•œ ํ•˜๋ฃจ์˜€๊ธฐ๋„ ํ–ˆ๋‹ค. ํŠนํžˆ, ์‹ค์Šต์„ ํ•˜๋ฉด์„œ ์ •๋ง ๋งŽ์ด ๋ฐฐ์šด ๊ฒƒ ๊ฐ™๋‹ค. ์•„์ง์€ dimension ๊ณ„์‚ฐ์ด ์ต์ˆ™ํ•˜์ง€ ์•Š์•„ ํ—ˆ๋‘ฅ๋Œ€์ง€๋งŒ ์ฐจ๊ทผํžˆ ํ•œ ์ค„์”ฉ ๋œฏ์–ด๋ณด๋ ค ํ•œ๋‹ค. ๊ทธ๋Ÿฐ๋ฐ ์‹œ๊ฐ„์ด ์žˆ์„๊นŒ..? ๐Ÿฅฒ


๐Ÿš€ ๋‚ด์ผ ํ•  ์ผ

  • ๊ฐ•์˜ ์ˆ˜๊ฐ•
    • [Data Viz] 5-1, 2, 3๊ฐ•
  • ์˜คํ”ผ์Šค์•„์›Œ
This post is licensed under CC BY 4.0 by the author.

[Week 04 / Day 014] ํ•™์Šต ๊ธฐ๋ก

[Week 04] ๋ฉ˜ํ† ๋ง