Skip to content

Guest. Optimal Control

Given by: Asst. Prof. Gokhan Alcan

  1. Optimal Control Motivation

  2. Problem Formulation

  3. Dynamic Programming & Bellman’s Principle

  4. Iterative LQR

  5. Demo: Cart-Pole Swing-Up

  6. Constrained Optimal Control


Optimization cost(time or frequency-domain). In the time domain, judge a closed-loop step response:

  • Lower rise time     \implies bigger overshoot.
  • Lower overshoot     \implies longer rise time.

method:

  1. Imposing additional constraints.
  2. Integral criteria let e=rye = r - y(reference - output), following common scalar measures how good a response is:

IAE: Integral of e|e| from 0 to \infty. ITAE: Integral of tet|e| from 0 to \infty. ISE: Integral of e2e^2 from 0 to \infty.

Balance tracking vs effort with a weighted sum: J=0(qe2tracking+ru2effort)dt(q,r): designer-chosen weightsJ = \int_{0}^{\infty} \left( \underbrace{q e^2}_{\text{tracking}} + \underbrace{r u^2}_{\text{effort}} \right) dt \qquad (q, r)\text{: designer-chosen \textbf{weights}}

What Optimization variables to choose

  • Controllers: Pick a class(e.g. PID) and optimize its parameters. JJ is a function of these parameters(evaluated by simulation); closed-loop stability conditions make the problem typically nonconvex.
Because the control system must also satisfy the strict condition of ‘closed-loop stability’, the graph of the objective function J as a function of the parameters resembles a series of undulating peaks and valleys, riddled with ‘local optima’ (traps). Optimisation algorithms can easily get stuck in a ‘false optimum’ and fail to find the truly optimal set of parameters.
  • Control signals: Optimize u(t)u(t) directly. Awkward in continout time(function-space problem); So we just solve in discrete (sequence) time — back to finite-dimensional optimization.
Skip the 'controller' altogether and ask directly: at every point in time t, how much u(t) should be given. Function-space problem: In a continuous physical world, time is infinitely dense. This means that within any given period of time, there are an infinite number of points in time.
min{uk}k=0N1J=k=0N1(xk,uk)+N(xN)s.t.xk+1=f(xk,uk),k=0,,N1hk(xk,uk)=0,k=0,,N1gk(xk,uk)0,k=0,,N1hN(xN)=0gN(xN)0x0=xinit\begin{aligned} \min_{\{u_k\}_{k=0}^{N-1}} \quad & J = \sum_{k=0}^{N-1} \ell(x_k, u_k) + \ell_N(x_N) \\ \text{s.t.} \quad & x_{k+1} = f(x_k, u_k), \quad k = 0, \dots, N-1 \\ & h_k(x_k, u_k) = 0, \quad k = 0, \dots, N-1 \\ & g_k(x_k, u_k) \le 0, \quad k = 0, \dots, N-1 \\ & h_N(x_N) = 0 \\ & g_N(x_N) \le 0 \\ & x_0 = x_{\text{init}} \end{aligned}
This approach is widely used in model predictive control (MPC), robotic path planning and economic system optimisation.
  • Optimization variable {uk}k=0N1\{u_k\}_{k=0}^{N-1} refers to the control inputs (such as the accelerator, brakes and steering angle of an autonomous vehicle) that we need to apply at each of the N time steps, ranging from 0 to N−1. Our task is to find the optimal sequence of control signals.

  • Cost function J: we want the lowest cost:

k=0N1(xk,uk)\sum_{k=0}^{N-1} \ell(x_k, u_k) — Stage Cost: represents the immediate cost incurred at each intermediate time step k due to the state deviating from the target (for example, the car veering off course) or the control action being too forceful (for example, slamming the accelerator).

N(xN)\ell_N(x_N) — Terminal Cost: indicates how far the system’s final state is from our true endpoint at the conclusion of the Nth step. This is typically used to ensure that the system converges stably.

s.t. is an abbreviation for ‘subject to’, meaning ‘provided that the following conditions are met’.

  1. System dynamics constraints (Physics/Dynamics): xk+1=f(xk,uk)x_{k+1} = f(x_k, u_k).
  2. Path Constraints hk(xk,uk)=0,gk(xk,uk)0h_k(x_k, u_k) = 0, \quad g_k(x_k, u_k) \le 0 This is a strict requirement that must be met at every intermediate point throughout the entire process (from k=0 to N−1).

hk=0h_k=0 Equational constraints (e.g. dynamics, terminal goal, grasp constrains) gk0g_k \le 0 Inequality constraints (e.g. bounds, obstacles, frictions, cones ).

3. Dynamic Programming & Bellman’s Principle

Section titled “3. Dynamic Programming & Bellman’s Principle”
  • Main idea of Dynamic Programming: always remember the answers to the sub-problems you have already solved.
  • If a problem can be broken down into smaller sub-problems, those can be broken into smaller ones still, and some sub-problems overlap — then you have a DP problem.

“An optimal policy has the property that no matter what the previous decisions (i.e., controls) have been, the remaining decisions must constitute an optimal policy with regard to the state resulting from those previous decisions.” — Bellman, 1957

"Optimal path"

Applying this principle reduces the number of candidates for the optimal solution: once we know the optimal sub-path from b to e, any a→e trajectory through b must reuse it.

We are given three things:

  1. An initial state x0x_0 (start position)
  2. A guessed control sequence Uˉ=(uˉ0,...,uˉN1)\bar{U} = (\bar{u}_0, . . . , \bar{u}_{N-1}) (initial “plan”)
  3. Known dynamics xk+1=f(xk,uk)x_{k+1} = f(x_k,u_k) (physics model)

Without optimization, nominal trajectory Xˉ=(xˉ0,...,xˉN)\bar{X} = (\bar{x}_0, ... ,\bar{x}_N) (usually bad)

min{uk}k=0N1J=k=0N1(xk,uk)+N(xN)\min_{\{u_k\}_{k=0}^{N-1}} J = \sum_{k=0}^{N-1} \ell(x_k, u_k) + \ell_N(x_N) \\

Running cost (xk,uk)\ell(x_k, u_k)

  • Penalizes deviation from goal along the way.
  • Penalizes large control effort (energy).
  • Typically: =xkxrefQ2+ukR2\ell = ||x_k - x_{ref}||^2_Q + ||u_k||^2_R

Terminal cost (xN)\ell(x_N) = Boundary Condition

  • Penalizes where we end up.
  • Often lare weight     \implies “reach to the goal!”.
  • Typically: N=xNxQf2\ell_N = ||x_N - x^*||^2_{Q_f}
  • At the last step k=Nk = N, there are no more decisions.
  • So the “cost-to-go” from xNx_N is simply N(XN)\ell_N(X_N)
  • This is VN(XN)=N(xN)V_N(X_N) = \ell_N(x_N) — the value function seed.
  • The backward pass starts here and walks back to k=0k=0.

The Big idea: One big optimization can be solved as many small ones, one step at a time.

From the Bellman recursion:

Qk(x,u)=(x,u)+Vk+1(f(x,u))Q_k(x, u) = \ell(x, u) + V_{k+1}(f(x, u)) Vk(x)=minuQk(x,u)V_k(x) = \min_{u} Q_k(x, u)

Why we can’t solve this directly:

  • VkV_k and QkQ_k are functions over the whole continuous state–action space.
  • A grid over Rnx\mathbb{R}^{n_x} scales as O(Nnx)\mathcal{O}(N^{n_x})curse of dimensionality.
  • No closed form for general nonlinear \ell and ff.

Start from a nominal rollout (Step 0):

(xˉ0,uˉ0),(xˉ1,uˉ1),,xˉN(\bar{x}_0, \bar{u}_0), (\bar{x}_1, \bar{u}_1), \dots, \bar{x}_N

Define perturbations around it:

δxk=xkxˉk,δuk=ukuˉk\delta x_k = x_k - \bar{x}_k, \quad \delta u_k = u_k - \bar{u}_k

Approximate QkQ_k as a quadratic in (δxk,δuk)(\delta x_k, \delta u_k) centered at (xˉk,uˉk)(\bar{x}_k, \bar{u}_k).

Valid only near the nominal — we will rebuild the model after every trajectory update.

Second-order Taylor at (xˉk,uˉk)(\bar{x}_k, \bar{u}_k)

Section titled “Second-order Taylor at (xˉk,uˉk)(\bar{x}_k, \bar{u}_k)(xˉk​,uˉk​)”

Drop the constant Qk(xˉk,uˉk)Q_k(\bar{x}_k, \bar{u}_k) (it does not affect argminδu\arg\min_{\delta u}). The remaining gradient + Hessian terms:

Q(δx,δu)[QxQu][δxδu]linear (gradient)+12[δxδu][QxxQxuQuxQuu][δxδu]quadratic (Hessian)Q(\delta x, \delta u) \approx \underbrace{\begin{bmatrix} Q_x \\ Q_u \end{bmatrix}^\top \begin{bmatrix} \delta x \\ \delta u \end{bmatrix}}_{\text{linear (gradient)}} + \frac{1}{2} \underbrace{\begin{bmatrix} \delta x \\ \delta u \end{bmatrix}^\top \begin{bmatrix} Q_{xx} & Q_{xu} \\ Q_{ux} & Q_{uu} \end{bmatrix} \begin{bmatrix} \delta x \\ \delta u \end{bmatrix}}_{\text{quadratic (Hessian)}}

All blocks Qx,Qu,Qxx,Qxu,QuuQ_x, Q_u, Q_{xx}, Q_{xu}, Q_{uu} are partial derivatives of QkQ_k, evaluated at the nominal (xˉk,uˉk)(\bar{x}_k, \bar{u}_k) — so they are just numbers/matrices, not functions.


Recall Qk(x,u)=(x,u)+Vk+1(f(x,u))Q_k(x, u) = \ell(x, u) + V_{k+1}(f(x, u)). Differentiating and writing VVk+1V' \equiv V_{k+1}:

Qx=x+fxVxQu=u+fuVxQxx=xx+fxVxxfx+VxfxxQuu=uu+fuVxxfu+VxfuuQux=ux+fuVxxfx+Vxfux\begin{aligned} Q_x &= \ell_x + f_x^\top V_x' \\ Q_u &= \ell_u + f_u^\top V_x' \\ Q_{xx} &= \ell_{xx} + f_x^\top V_{xx}' f_x + V_x' \cdot f_{xx} \\ Q_{uu} &=\ell_{uu} + f_u^\top V_{xx}' f_u + V_x' \cdot f_{uu} \\ Q_{ux} &= \ell_{ux} + f_u^\top V_{xx}' f_x + V_x' \cdot f_{ux} \end{aligned}

where x,xx,\ell_x, \ell_{xx}, \dots are partials of the stage cost; fx,fuf_x, f_u are dynamics Jacobians at (xˉk,uˉk)(\bar{x}_k, \bar{u}_k); and Vx,VxxV_x', V_{xx}' are inherited from the next step’s value (backward pass).

xx\ell_{xx}:

The degree of curvature of the cost function at this stage (for example, the penalty increase sharply the further it is from the target)

fxVxxfxf_x^\top V_{xx}' f_x:

The value curvature is propagated back in the next stage. This means that, due to the existence of the system dynamics f(x), a change in the current state will cause a change in the future state, thereby causing a change in the future total value V_xx.

Vxfxx\color{red}{V_x' \cdot f_{xx}}:

The non-linearity inherent in system dynamics. If the robot’s dynamics, f, are non-linear (for example, due to air resistance or joint rotation), this final term will arise. 在实际应用中,如果保留最后一项红色的 $V_x' \cdot f_{xx}$,这个算法叫做 DDP(Differential Dynamic Programming,微分动态规划); 如果因为 $f_{xx}$(张量计算)太难算而直接把它丢弃(当成 0),这个算法就是 iLQR。iLQR 虽然忽略了动力学的二阶项,但极大地减少了计算量,而且在绝大多数机器人场景下依然收敛得很好!

For each δx\delta x, find the best δu\delta u Since QQ is quadratic in δu\delta u, the minimum is at the stationary point. Take the gradient w.r.t. δu\delta u:

Qδu=Qu+Quxδx+Quuδu\frac{\partial Q}{\partial \delta u} = Q_u + Q_{ux} \delta x + Q_{uu} \delta u

Set it to zero (assuming Quu0Q_{uu} \succ 0):

Qu+Quxδx+Quuδu=!0Q_u + Q_{ux} \delta x + Q_{uu} \delta u \overset{!}{=} 0

Solve for δu\delta u:

δu=Quu1(Qu+Quxδx)\delta u^\star = -Q_{uu}^{-1} (Q_u + Q_{ux} \delta x)

Split the constant from the δx\delta x-dependent part:

δu=Quu1Qukk (feedforward)+(Quu1Qux)Kk (feedback)δx\delta u^\star = \underbrace{-Q_{uu}^{-1} Q_u}_{\mathbf{k}_k \text{ (feedforward)}} + \underbrace{(-Q_{uu}^{-1} Q_{ux})}_{\mathbf{K}_k \text{ (feedback)}} \delta x


Regularize QuuQ_{uu} before inverting. Far from the optimum, QuuQ_{uu} can be ill-conditioned or even indefinite. A direct inverse then produces huge or completely wrong steps. Levenberg–Marquardt fix:

Q~uu=Quu+λI,λ0\tilde{Q}_{uu} = Q_{uu} + \lambda I, \quad \lambda \ge 0

  • λ0\lambda \to 0: full Newton step (fast).
  • λ\lambda \to \infty: gradient descent (safe).
  • Trust-region schedule: λ\lambda \uparrow on rejected step, λ\lambda \downarrow on accepted step.

Plug δu\delta u^\star back into the quadratic expansion to update the Value function approximation (Vx,VxxV_x, V_{xx}) for step kk:

Vx=Qx+KQu+KQuuk+QxukV_x = Q_x + K^\top Q_u + K^\top Q_{uu} k + Q_{xu} k Vxx=Qxx+QxuK+KQux+KQuuKV_{xx} = Q_{xx} + Q_{xu} K + K^\top Q_{ux} + K^\top Q_{uu} K

  • These Vx,VxxV_x, V_{xx} are passed to step k1k - 1