๊ด€๋ฆฌ ๋ฉ”๋‰ด

๋ชฉ๋กComputer Science/์ธ๊ณต์ง€๋Šฅ (2)

Hi-๋žŒ๐Ÿ‘‹ High-๋žŒโ˜€๏ธ

[์ธ๊ณต์ง€๋Šฅ] SP09-Regularization

์ผ๋ฐ˜ํ™”(generalization) ์„ฑ๋Šฅ์„ ๋†’์ด๋Š” ์ •๊ทœํ™” ๋ฐฉ๋ฒ•๋“ค Early stopping ๋งค iteration๋งˆ๋‹ค validation performance๋ฅผ ์ธก์ •ํ•˜์—ฌ, ์ผ์ • ๊ธฐ๊ฐ„ ๋™์•ˆ ๊ฐœ์„ ์ด ์—†์„ ์‹œ ๊ณผ์ ํ•ฉ๋˜๊ธฐ ์ „์— ํ•™์Šต์„ ์กฐ๊ธฐ ์ข…๋ฃŒํ•ด์ฃผ๋Š” ๊ธฐ๋Šฅ์ด๋‹ค. Ensembling ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ค๊ณ , ๊ทธ ์˜ˆ์ธก์„ ๊ฒฐํ•ฉํ•˜์—ฌ ์ตœ์ข… ์˜ˆ์ธก์„ ๋งŒ๋“œ๋Š” ๋ฐฉ๋ฒ• ๋ชจ๋ธ์„ ๊ฒฐํ•ฉํ•  ๋•Œ, regression ๋ฌธ์ œ๋Š” output๋“ค์˜ ํ‰๊ท ์„ ์‚ฌ์šฉํ•˜๊ณ  classification ๋ฌธ์ œ๋Š” softmax activation์„ ๊ฑฐ์น˜๊ธฐ ์ „ ๊ฐ’๋“ค์˜ ํ‰๊ท ์„ ์‚ฌ์šฉํ•œ๋‹ค. ๋˜๋‹ค๋ฅธ ๋ฐฉ๋ฒ•์œผ๋กœ regression ๋ฌธ์ œ๋Š” output๋“ค์˜ ์ค‘์•™๊ฐ’์„ ์‚ฌ์šฉํ•˜๊ณ  classification ๋ฌธ์ œ๋Š” ์ตœ๋นˆ๊ฐ’์„ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ์žˆ๋‹ค. ๋ชจ๋ธ๋ณ„๋กœ ๋‹ค๋ฅธ ์ดˆ๊ธฐ๊ฐ’์„ ์ฃผ๊ฑฐ๋‚˜, ๋ฐ์ดํ„ฐ..

[์ธ๊ณต์ง€๋Šฅ] SP06-Fitting Models

[1] non-convex ํ•จ์ˆ˜์—์„œ์˜ ๋ฌธ์ œ์™€ ํ•ด๊ฒฐ ๋ฐฉ๋ฒ• ๋น„์„ ํ˜• ํ•จ์ˆ˜์—์„œ loss function์€ ๋‘ ๊ฐ€์ง€ ํ•จ์ •์„ ๊ฐ–๋Š”๋‹ค. (1) local minima ์ด๋Š” ๊ธฐ์šธ๊ธฐ๊ฐ€ 0์ธ ์ง€์ ์ด๋ฉฐ, ์–ด๋А ๋ฐฉํ–ฅ์œผ๋กœ ์›€์ง์ด๋“  loss๊ฐ€ ์ฆ๊ฐ€ํ•˜๋Š” ์ง€์ ์ด๋‹ค. ํ•˜์ง€๋งŒ ์ด ์ง€์ ์€ ์ „์ฒด ํ•จ์ˆ˜์—์„œ ๊ฐ€์žฅ ์ž‘์€ ๊ฐ’, ์ฆ‰ global minimum์€ ์•„๋‹ˆ๋‹ค. (2) saddle point ์ด๊ฒƒ ๋˜ํ•œ ๊ธฐ์šธ๊ธฐ๊ฐ€ 0์ด์ง€๋งŒ, ์–ด๋–ค ๋ฐฉํ–ฅ์—์„œ๋Š” ์ฆ๊ฐ€ํ•˜๊ณ  ๋‹ค๋ฅธ ๋ฐฉํ–ฅ์œผ๋กœ๋Š” ๊ฐ์†Œํ•˜๋Š” ํŠน์ง•์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. ์ด๋Ÿฐ ํŠน์„ฑ์œผ๋กœ ์ธํ•ด ์ตœ์ ํ™” ๊ณผ์ •์—์„œ ์ข…์ข… ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค. ๊ธฐ์šธ๊ธฐ๊ฐ€ 0์— ๊ฐ€๊นŒ์›Œ์ง€๋ฉด์„œ loss๊ฐ€ ์—…๋ฐ์ดํŠธ๋˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. saddle point์™€ ๊ฐ™์€ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ–ˆ์„ ๋•Œ, ํ™•๋ฅ ์  ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•(Stochastic Gradient Descent, SG..