Week 3.b CS7670 09/21 2022 https://naizhengtan.github.io/22fall/ 1. intro to DL 2. back-propagation 3. building an autograd engine ---- 1. intro to DL -- AI? ML? DL? -- playing chess? -- tell cat vs. dog? -- one way to see DL f(x, \theta) -> y, where f is some arbitrarily complex function: x -------> +---+ | f | --> y \theta --> +---+ Now, we want "make" f(x0, \theta) -> y0 Q: how to find such a \theta? -- an optimization problem min (y0 - f(x0,\theta))^2 where x0,y0 are contants, and \theta are variables -- Q: how to solve this optimization problem? * if f is convex, life is good. * let's say f is non-convex... ...but differentiable... ...in practice. (hand-waving here) * gradient descent (GD) main idea: find mimimum by taking repeated steps in the opposite direction of the gradient. [are there any other approaches? yes, like coordinate descent] * optimized function g(\theta) = (y0 - f(x0,\theta))^2 * again, goal: min g(\theta) what we can do is applying GD: repeat: \theta -= 0.01*dg/d\theta [what is this 0.01? why we need it?] -- DL term mapping f: neural network arch x: network input y: network output \theta: network parameters (x0,y0): training data g: loss function 0.01: learning rate doing GD: training 2. learning representation by back-propagating errors The problem? [ask a student to read the abs] Neural network? [ask a student to read the third paragraph] Q: what does equation 1 mean? Understand Fig1 [draw better on board] 3. building an auto-grad engine from scratch -- background & notions (borrowed from d2l: https://d2l.ai/chapter_preliminaries/calculus.html) [see handout] * derivative A derivative is the rate of change in a function with respect to changes in its arguments. * partial derivative some derivative that apply to such multivariate functions. * gradient We can concatenate partial derivatives of a multivariate function with respect to all its variables to obtain a vector that is called the gradient of the function. * chain rule y = f(g(x)) => y=f(u) and u=g(x) dy/dx = dy/du * du/dx [see jupyter notebook]