Optimization cost(time or frequency-domain). In the time domain, judge a closed-loop step response:
Lower rise time ⟹ bigger overshoot.
Lower overshoot ⟹ longer rise time.
method:
Imposing additional constraints.
Integral criteria
let e=r−y(reference - output), following common scalar measures how good a response is:
IAE: Integral of ∣e∣ from 0 to ∞.
ITAE: Integral of t∣e∣ from 0 to ∞.
ISE: Integral of e2 from 0 to ∞.
Balance tracking vs effort with a weighted sum:
J=∫0∞trackingqe2+effortru2dt(q,r): designer-chosen weights
What Optimization variables to choose
Controllers: Pick a class(e.g. PID) and optimize its parameters. J is a function of these parameters(evaluated by simulation); closed-loop stability conditions make the problem typically nonconvex.
Because the control system must also satisfy the strict condition of ‘closed-loop stability’, the graph of the objective function J as a function of the parameters resembles a series of undulating peaks and valleys, riddled with ‘local optima’ (traps). Optimisation algorithms can easily get stuck in a ‘false optimum’ and fail to find the truly optimal set of parameters.
Control signals: Optimize u(t) directly. Awkward in continout time(function-space problem); So we just solve in discrete (sequence) time — back to finite-dimensional optimization.
Skip the 'controller' altogether and ask directly: at every point in time t, how much u(t) should be given. Function-space problem: In a continuous physical world, time is infinitely dense. This means that within any given period of time, there are an infinite number of points in time.
This approach is widely used in model predictive control (MPC), robotic path planning and economic system optimisation.
Optimization variable {uk}k=0N−1 refers to the control inputs (such as the accelerator, brakes and steering angle of an autonomous vehicle) that we need to apply at each of the N time steps, ranging from 0 to N−1. Our task is to find the optimal sequence of control signals.
Cost function J: we want the lowest cost:
∑k=0N−1ℓ(xk,uk) — Stage Cost: represents the immediate cost incurred at each intermediate time step k due to the state deviating from the target (for example, the car veering off course) or the control action being too forceful (for example, slamming the accelerator).
ℓN(xN) — Terminal Cost: indicates how far the system’s final state is from our true endpoint at the conclusion of the Nth step. This is typically used to ensure that the system converges stably.
s.t. is an abbreviation for ‘subject to’, meaning ‘provided that the following conditions are met’.
System dynamics constraints (Physics/Dynamics): xk+1=f(xk,uk).
Path Constraints hk(xk,uk)=0,gk(xk,uk)≤0
This is a strict requirement that must be met at every intermediate point throughout the entire process (from k=0 to N−1).
Main idea of Dynamic Programming: always remember the answers to the sub-problems you have already solved.
If a problem can be broken down into smaller sub-problems, those can be broken into smaller ones still, and some sub-problems overlap — then you have a DP problem.
“An optimal policy has the property that no matter what the previous decisions (i.e., controls) have been, the remaining decisions must constitute an optimal policy with regard to the state resulting from those previous decisions.” — Bellman, 1957
Applying this principle reduces the number of candidates for the optimal solution: once we know the
optimal sub-path from b to e, any a→e trajectory through b must reuse it.
All blocks Qx,Qu,Qxx,Qxu,Quu are partial derivatives of Qk, evaluated at the nominal(xˉk,uˉk) — so they are just numbers/matrices, not functions.
where ℓx,ℓxx,… are partials of the stage cost;
fx,fu are dynamics Jacobians at (xˉk,uˉk);
and Vx′,Vxx′ are inherited from the next step’s value (backward pass).
ℓxx:
The degree of curvature of the cost function at this stage (for example, the penalty increase sharply the further it is from the target)
fx⊤Vxx′fx:
The value curvature is propagated back in the next stage. This means that, due to the existence of the system dynamics f(x), a change in the current state will cause a change in the future state, thereby causing a change in the future total value V_xx.
Vx′⋅fxx:
The non-linearity inherent in system dynamics. If the robot’s dynamics, f, are non-linear (for example, due to air resistance or joint rotation), this final term will arise.
在实际应用中,如果保留最后一项红色的 $V_x' \cdot f_{xx}$,这个算法叫做 DDP(Differential Dynamic Programming,微分动态规划);
如果因为 $f_{xx}$(张量计算)太难算而直接把它丢弃(当成 0),这个算法就是 iLQR。iLQR 虽然忽略了动力学的二阶项,但极大地减少了计算量,而且在绝大多数机器人场景下依然收敛得很好!
Regularize Quu before inverting.
Far from the optimum, Quu can be ill-conditioned or even indefinite.
A direct inverse then produces huge or completely wrong steps.
Levenberg–Marquardt fix:
Q~uu=Quu+λI,λ≥0
λ→0: full Newton step (fast).
λ→∞: gradient descent (safe).
Trust-region schedule:λ↑ on rejected step, λ↓ on accepted step.