Gradient descent

Section 14.5 Gradient descent

Objectives

Describe gradient descent as an iterative method based on moving in the direction of \(-\Grad f\text{.}\)
🔗

🔗
Extend gradient descent to constrained problems using the Lagrangian \(\cl{L}=f+\lambda(c-g)\) and a descent-ascent update rule.
🔗

🔗
Interpret the multiplier \(\lambda\) as measuring how the optimal value responds to changes in the constraint.
🔗

🔗

🔗

Introduction will go here.

🔗

If \(f(x,y)\) is a differentiable function, the direction of steepest decrease at a point is \(-\,\Grad f\text{.}\) This suggests the iterative scheme

\begin{equation*} \bv{x}_{k+1} = \bv{x}_k - \alpha \Grad f(\bv{x}_k), \end{equation*}

where \(\alpha > 0\) is a step size.

🔗

For suitable choices of \(\alpha\text{,}\) this process moves downhill and can converge to a local minimum.
🔗

🔗
Now suppose we want to optimize \(f(x,y)\) subject to the constraint \(g(x,y)=c\text{.}\) Previously, we saw that a constrained extremum satisfies

\begin{equation*} \Grad f = \lambda \Grad g. \end{equation*}

Rather than solving this system algebraically, we introduce the Lagrangian:

\begin{equation*} \cl{L}(x,y,\lambda)=f(x,y)+\lambda(c-g(x,y)). \end{equation*}

🔗

🔗
The stationary conditions \(\Grad_{x,y,\lambda}\cl{L}=0\) reproduce the Lagrange multiplier equations:
🔗

\begin{align*} \pdv{f}{x} - \lambda\pdv{g}{x} \amp = 0\\ \pdv{f}{y} - \lambda\pdv{g}{y} \amp = 0\\ c - g(x,y) \amp = 0 \end{align*}

🔗
However, instead of minimizing \(\cl{L}\) in all variables (which would fail, since the critical point is always a saddle point), we use a combined descent-ascent strategy:
🔗

\begin{equation*} \bv{x}_{k+1} = \bv{x}_k - \alpha \Grad_{x,y}\cl{L}, \end{equation*}

\begin{equation*} \lambda_{k+1} = \lambda_k + \beta \pdv{\cl{L}}{\lambda}. \end{equation*}

The variables \((x,y)\) move in the direction that decreases \(f\text{,}\) while \(\lambda\) adjusts to enforce the constraint. When this system settles, we recover the constrained optimum.
🔗

🔗
What does \(\lambda\) represent? Since

\begin{equation*} \pdv{\cl{L}}{c} = \lambda, \end{equation*}

the multiplier measures how the Lagrangian changes when the constraint level \(c\) changes.

🔗

🔗
Let \((x^{*}(c),y^{*}(c))\) denote the constrained optimizer and consider the optimal value

\begin{equation*} f(x^{*}(c),y^{*}(c)). \end{equation*}

Differentiating with respect to \(c\) and using the chain rule together with

\begin{equation*} \Grad f = \lambda \Grad g, \end{equation*}

one finds

\begin{equation*} \dv{f}{c} = \lambda. \end{equation*}

🔗

Thus the Lagrange multiplier measures the sensitivity of the optimal value to changes in the constraint.
🔗

🔗

🔗

Prev Top Next