Function Repository Resource:

QLearningLQRegulator (1.0.0) current version: 1.0.1 »

Compute the LQ Regulator using Q-learning

Contributed by: Suba Thomas

ResourceFunction["QLearningLQRegulator"][sys,rspsec,tspec]

uses Q-learning to compute the optimal LQ regulator for sys based on regulator specification rspec and time specification tspec.

ResourceFunction["QLearningLQRegulator"][…,prop]

gives the value of the property prop.

Details

ResourceFunction["QLearningLQRegulator"] computes the optimal linear quadratic (LQ) regulating gain using Q-learning without knowledge of the system's dynamics.

The gain is computed iteratively by an actor implementing a policy and a critic evaluating the system's response and updating the policy.

The input u(k) is computed to minimize the Q function

x(k)

state vector

u(k)

input vector

state weight

input weight

optimal policy

The Q function can be expressed using a kernel matrix s as

The system sys is if the form {sim,x₀}.

The simulation sim can take the following forms:

Function[…]

pure function

LibraryFunction[…]

library function

StateSpaceModel[…]

state-space model

AffineStateSpaceModel[…]

affine state-space model

NonlinearStateSpaceModel[…]

nonlinear state-space model

The specification x₀ causes the simulation to start at value x₀.

The regulator specification rspec is of the form {q,r,g₀}.

state weight matrix

input weight matrix

g₀

initial policy gain

The time specification tspec can take the following froms:

nsim

total number of simulations

{nsim,blen}

total number of simulations and batch length

ResourceFunction["QLearningLQRegulator"][…,"Data"] returns a SystemsModelControllerData object cd that can be used to extract additional properties using the form cd["prop"].

ResourceFunction["QLearningLQRegulator"][…,"prop"] can be used to directly give the value of cd["prop"].

Possible values for properties "prop" include:

"BatchLength"

least squares batch length

"ConvergedQ"

if the gains converged to a value

"Gain"

final gain

"GainValues"

list of gain values

"InputCount"

number of inputs

"InputValues"

list of input values

"KernelMatrix"

kernel matrix s

"SimulationRange"

simulation range

"StateCount"

number of states

"StateValues"

list of state values

Examples

Basic Examples (8)

A system that starts at -1:

In[1]:=

The state and input weights:

In[2]:=

The initial value of the gain:

In[3]:=

The total simulations is 30 with a batch consisting of 10 simulations:

In[4]:=

Compute the controller:

In[5]:=

Out[5]=

The gain converges to ~0.1:

In[6]:=

Out[6]=

The state is regulated to the origin:

In[7]:=

Out[7]=

The input sequence that was used during the simulation:

In[8]:=

Out[8]=

Scope (6)

A system with one state and one input:

In[9]:=

Compute a controller:

In[10]:=

Out[10]=

The gain values:

In[11]:=

Out[11]=

The state is regulated to the origin:

In[12]:=

Out[12]=

The input sequence that was used during the simulation:

In[13]:=

Out[13]=

A multi-state system:

In[14]:=

Compute a controller:

In[15]:=

$cd = ResourceFunction["QLearningLQRegulator"][sys, {( { {1, 0}, {0, 1} } ), \!$\* TagBox[ RowBox[{"(", "", GridBox[{ {"1"} }, GridBoxAlignment->{"Columns" -> {{Center}}, "Rows" -> {{Baseline}}}, GridBoxSpacings->{"Columns" -> { Offset[0.27999999999999997`], { Offset[0.7]}, Offset[0.27999999999999997`]}, "Rows" -> { Offset[0.2], { Offset[0.4]}, Offset[0.2]}}], "", ")"}], Function[BoxForm`e$, MatrixForm[BoxForm`e$]]]$, ( { {0, 0} } )}, {50, 10}]$

Out[15]=

The gain values:

In[16]:=

Out[16]=

The states are regulated to the origin:

In[17]:=

Out[17]=

The input sequence that was used during the simulation:

In[18]:=

Out[18]=

A multi-state, multi-input system:

In[19]:=

Compute a controller:

In[20]:=

Out[20]=

The gain values:

In[21]:=

Out[21]=

The states are regulated to the origin:

In[22]:=

Out[22]=

The input sequence that was used during the simulation:

In[23]:=

Out[23]=

A state-space model:

In[24]:=

Compute a controller:

In[25]:=

Out[25]=

The gain values:

In[26]:=

Out[26]=

The states are regulated to the origin:

In[27]:=

Out[27]=

The input sequence that was used during the simulation:

In[28]:=

Out[28]=

By default, a SystemsModelControllerData object is returned:

In[29]:=

Out[29]=

It can be used to obtain various properties:

In[30]:=

Out[30]=

The value of a specific property:

In[31]:=

Out[31]=

A list of property values:

In[32]:=

Out[32]=

The values of all properties as an Association:

In[33]:=

Out[33]=

As a Dataset:

In[34]:=

Out[34]=

Get a property directly:

In[35]:=

Out[35]=

Applications (7)

The model of the U.S Coast Guard cutter Tampa based on sea-trials data that gives the heading in response to rudder angle inputs:

In[36]:=

ssm = StateSpaceModel[
TransferFunctionModel[{{{(-0.0184) (0.0068 + s)}}, s (0.0063 + s) (0.2647 + s)}, s], StateSpaceRealization -> "ObservableCompanion"]

Out[36]=

Discretize the model:

In[37]:=

Out[37]=

The model is marginally stable:

In[38]:=

Out[38]=

A Q-learning LQ regulator starting with an initial heading of 5°:

In[39]:=

$cd = ResourceFunction["QLearningLQRegulator", ResourceVersion->"1.0.0"][{ssmd, {0, 0, 5 °}}, {DiagonalMatrix[{1, 1, 10^5}], {{10^3}}, {{-50, 0, 0}}}, 400]$

Out[39]=

The computed gain values:

In[40]:=

Out[40]=

The heading is regulated back to the origin in about 20 seconds:

In[41]:=

Out[41]=

The rudder input values:

In[42]:=

Out[42]=

Properties and Relations (2)

The gain computed by simulation converges to the optimal solution:

In[43]:=

In[44]:=

In[45]:=

Out[45]=

The optimal solution computed with knowledge of the system's dynamics:

In[46]:=

Out[46]=

The above solution is computed using the discrete algebraic Riccati equation:

In[47]:=

$With[{x = DiscreteRiccatiSolve[{a, b}, {q, r}]}, Inverse[b\[ConjugateTranspose] . x . b + r] . (b\[ConjugateTranspose] . x . a)]$

Out[47]=

The gain computed by simulation converges to the optimal solution for a nonlinear system:

In[48]:=

$nssm = NonlinearStateSpaceModel[{{-0.7 Sin[x] + x^2 + u Cos[x]}, {x}}, {x}, {u}, SamplingPeriod -> 1];$

In[49]:=

In[50]:=

Out[50]=

The optimal solution computed using the linearized model:

In[51]:=

Out[51]=

The above solution is computed using the discrete algebraic Riccati equation:

In[52]:=

In[53]:=

$With[{x = DiscreteRiccatiSolve[{a, b}, {q, r}]}, Inverse[b\[ConjugateTranspose] . x . b + r] . (b\[ConjugateTranspose] . x . a)]$

Out[53]=

Possible Issues (4)

An unstable system may not converge to the optimal solution:

In[54]:=

sys = {Function[{x, u, k}, 3 x + u], {RandomReal[{-2, 2}]}};
{q, r} = {{{1}}, {{1}}};
tspec = 100;

In[55]:=

Out[55]=

Adjusting the initial gain causes it to converge to the optimal solution:

In[56]:=

Out[56]=

The optimal solution:

In[57]:=

Out[57]=

A system with a disturbance may not converge to the optimal solution:

In[58]:=

sys = {Function[{x, u, k}, -0.9 x + 2 u + 0.01 RandomReal[]], {1}};
{q, r} = {{{1}}, {{1}}};
tspec = 100;

In[59]:=

Out[59]=

Adjusting the initial gain may cause it to come close to the optimal solution:

In[60]:=

Out[60]=

The optimal solution:

In[61]:=

Out[61]=

The initial gain must be stabilizing:

In[62]:=

sys = {Function[{x, u, k}, -0.9 x + 2 u ], {1}};
{q, r} = {{{1}}, {{1}}};
tspec = 100;

Otherwise the state values will blow up:

In[63]:=

Out[63]=

The state values with a stabilizing gain:

In[64]:=

Out[64]=

The system must be a discrete-time system:

In[65]:=

Out[65]=

In[66]:=

Out[66]=

Publisher

Suba Thomas

Requirements

Wolfram Language 13.0 (December 2021) or above

Version History

1.0.1 – 15 January 2025
1.0.0 – 10 January 2025

Source Metadata

Citation:
- F. L. Lewis, D. Vrabie and K. G. Vamvoudakis, "Reinforcement Learning and Feedback Control: Using Natural Decision Methods to Design Optimal Adaptive Controllers," in IEEE Control Systems Magazine, vol. 32, no. 6, pp. 76-105, Dec. 2012, doi: 10.1109/MCS.2012.2214134.
- Franklin, Gene F., et al. Feedback Control of Dynamic Systems. Spain, Pearson, 2010.

Related Resources

License Information

This work is licensed under a Creative Commons Attribution 4.0 International License

QLearningLQRegulator (1.0.0) current version: 1.0.1 »

Details

Examples

Basic Examples (8)

Scope (6)

Applications (7)

Properties and Relations (2)

Possible Issues (4)

Publisher

Requirements

Version History

Source Metadata

Related Resources

Related Symbols

License Information