Skip to main content
Contents
Embed
Dark Mode Prev Up Next
\(\require{physics}\require{upgreek}\everymath{\displaystyle}
\newcommand{\N}{\mathbb N}
\newcommand{\Z}{\mathbb Z}
\newcommand{\Q}{\mathbb Q}
\newcommand{\R}{\mathbb R}
\newcommand{\inv}{^{-1}}
\newcommand{\DS}{\displaystyle}
\newcommand{\eps}{\varepsilon}
\newcommand{\tsub}[1]{_{\mathrm{#1}}}
\newcommand{\ee}{\mathrm{e}}
\newcommand{\ii}{\mathrm{i}}
\newcommand{\limit}[1]{\lim\limits_{#1}}
\newcommand{\resid}[1]{\underset{#1}{\Res}}
\DeclareMathOperator{\sinc}{sinc}
\DeclareMathOperator{\sgn}{sgn}
\newcommand{\pii}{\pi}
\DeclareMathOperator{\Prob}{P}
\DeclareMathOperator{\EV}{E}
\DeclareMathOperator{\Var}{Var}
\newcommand{\bv}[1]{\boldsymbol{#1}}
\newcommand{\uv}[1]{\hat{\bv{#1}}}
\newcommand{\cl}[1]{\mathcal{#1}}
\newcommand{\bb}[1]{\mathbb{#1}}
\DeclareMathOperator{\Cis}{cis}
\DeclareMathOperator{\RE}{Re}
\DeclareMathOperator{\IM}{Im}
\newcommand{\xd}{\mathbf{d}}
\newcommand{\seq}[3]{{#1}_{#2},\cdots,{#1}_{#3}}
\newcommand{\psup}[1]{^{(#1)}}
\newcommand{\hypext}{{}^*}
\DeclareMathOperator{\st}{st}
\newcommand{\set}[1]{\left\{#1\right\}}
\DeclareMathOperator{\Sin}{Sin}
\DeclareMathOperator{\Cos}{Cos}
\DeclareMathOperator{\Tan}{Tan}
\DeclareMathOperator{\Sec}{Sec}
\DeclareMathOperator{\Csc}{Csc}
\DeclareMathOperator{\Cot}{Cot}
\DeclareMathOperator{\Log}{Log}
\DeclareMathOperator{\Arg}{Arg}
\DeclareMathOperator{\Ln}{Ln}
\DeclareMathOperator{\Grad}{grad}
\DeclareMathOperator{\Div}{div}
\DeclareMathOperator{\Curl}{curl}
\newcommand{\rd}{\textstyle\mathop{}\!\mathrm{d}^{\!\!\!-}\hspace{-0.0555 em}}
\newcommand{\rpd}{\textstyle\mathop{}\!\partial^{\hspace{-0.5 em}-}\hspace{-0.1666 em}}
\newcommand{\rdv}[2]{\frac{\rd{#1}}{\rd{#2}}}
\newcommand{\rpdv}[2]{\frac{\rpd{#1}}{\rpd{#2}}}
\newcommand{\lt}{<}
\newcommand{\gt}{>}
\newcommand{\amp}{&}
\definecolor{fillinmathshade}{gray}{0.9}
\newcommand{\fillinmath}[1]{\mathchoice{\colorbox{fillinmathshade}{$\displaystyle \phantom{\,#1\,}$}}{\colorbox{fillinmathshade}{$\textstyle \phantom{\,#1\,}$}}{\colorbox{fillinmathshade}{$\scriptstyle \phantom{\,#1\,}$}}{\colorbox{fillinmathshade}{$\scriptscriptstyle\phantom{\,#1\,}$}}}
\)
Section 14.5 Gradient descent
Objectives
Describe gradient descent as an iterative method based on moving in the direction of
\(-\Grad f\text{.}\)
Extend gradient descent to constrained problems using the Lagrangian
\(\cl{L}=f+\lambda(c-g)\) and a descent-ascent update rule.
Interpret the multiplier
\(\lambda\) as measuring how the optimal value responds to changes in the constraint.
Introduction will go here.
If \(f(x,y)\) is a differentiable function, the direction of steepest decrease at a point is \(-\,\Grad f\text{.}\) This suggests the iterative scheme
\begin{equation*}
\bv{x}_{k+1} = \bv{x}_k - \alpha \Grad f(\bv{x}_k),
\end{equation*}
where \(\alpha > 0\) is a step size.
For suitable choices of
\(\alpha\text{,}\) this process moves downhill and can converge to a local minimum.
Now suppose we want to optimize \(f(x,y)\) subject to the constraint \(g(x,y)=c\text{.}\) Previously, we saw that a constrained extremum satisfies
\begin{equation*}
\Grad f = \lambda \Grad g.
\end{equation*}
Rather than solving this system algebraically, we introduce the Lagrangian:
\begin{equation*}
\cl{L}(x,y,\lambda)=f(x,y)+\lambda(c-g(x,y)).
\end{equation*}
The stationary conditions
\(\Grad_{x,y,\lambda}\cl{L}=0\) reproduce the Lagrange multiplier equations:
\begin{align*}
\pdv{f}{x} - \lambda\pdv{g}{x} \amp = 0\\
\pdv{f}{y} - \lambda\pdv{g}{y} \amp = 0\\
c - g(x,y) \amp = 0
\end{align*}
However, instead of minimizing
\(\cl{L}\) in all variables (which would fail, since the critical point is always a saddle point), we use a combined descent-ascent strategy:
\begin{equation*}
\bv{x}_{k+1} = \bv{x}_k - \alpha \Grad_{x,y}\cl{L},
\end{equation*}
\begin{equation*}
\lambda_{k+1} = \lambda_k + \beta \pdv{\cl{L}}{\lambda}.
\end{equation*}
The variables
\((x,y)\) move in the direction that decreases
\(f\text{,}\) while
\(\lambda\) adjusts to enforce the constraint. When this system settles, we recover the constrained optimum.
What does \(\lambda\) represent? Since
\begin{equation*}
\pdv{\cl{L}}{c} = \lambda,
\end{equation*}
the multiplier measures how the Lagrangian changes when the constraint level \(c\) changes.
Let \((x^{*}(c),y^{*}(c))\) denote the constrained optimizer and consider the optimal value
\begin{equation*}
f(x^{*}(c),y^{*}(c)).
\end{equation*}
Differentiating with respect to \(c\) and using the chain rule together with
\begin{equation*}
\Grad f = \lambda \Grad g,
\end{equation*}
one finds
\begin{equation*}
\dv{f}{c} = \lambda.
\end{equation*}
Thus the Lagrange multiplier measures the sensitivity of the optimal value to changes in the constraint.