Function Repository Resource:

QLearningLQRegulator (1.0.0) current version: 1.0.1 »

Source Notebook

Compute the LQ Regulator using Q-learning

Contributed by: Suba Thomas

ResourceFunction["QLearningLQRegulator"][sys,rspsec,tspec]

uses Q-learning to compute the optimal LQ regulator for sys based on regulator specification rspec and time specification tspec.

ResourceFunction["QLearningLQRegulator"][,prop]

gives the value of the property prop.

Details

ResourceFunction["QLearningLQRegulator"] computes the optimal linear quadratic (LQ) regulating gain using Q-learning without knowledge of the system's dynamics.
The gain is computed iteratively by an actor implementing a policy and a critic evaluating the system's response and updating the policy.
The input u(k) is computed to minimize the Q function .
x(k)state vector
u(k)input vector
qstate weight
rinput weight
goptimal policy
The Q function can be expressed using a kernel matrix s as .
The system sys is if the form {sim,x0}.
The simulation sim can take the following forms:
Function[]pure function
LibraryFunction[]library function
StateSpaceModel[]state-space model
AffineStateSpaceModel[]affine state-space model
NonlinearStateSpaceModel[]nonlinear state-space model
The specification x0 causes the simulation to start at value x0.
The regulator specification rspec is of the form {q,r,g0}.
qstate weight matrix
rinput weight matrix
g0initial policy gain
The time specification tspec can take the following froms:
nsimtotal number of simulations
{nsim,blen}total number of simulations and batch length
ResourceFunction["QLearningLQRegulator"][,"Data"] returns a SystemsModelControllerData object cd that can be used to extract additional properties using the form cd["prop"].
ResourceFunction["QLearningLQRegulator"][,"prop"] can be used to directly give the value of cd["prop"].
Possible values for properties "prop" include:
"BatchLength"least squares batch length
"ConvergedQ"if the gains converged to a value
"Gain"final gain
"GainValues"list of gain values
"InputCount"number of inputs
"InputValues"list of input values
"KernelMatrix"kernel matrix s
"SimulationRange"simulation range
"StateCount"number of states
"StateValues"list of state values

Examples

Basic Examples (8) 

A system that starts at -1:

In[1]:=
sys = {Function[{x, u, k}, 0.2 x + u], {-1}};

The state and input weights:

In[2]:=
{q, r} = {{{1}}, {{1}}};

The initial value of the gain:

In[3]:=
g0 = {{0}};

The total simulations is 30 with a batch consisting of 10 simulations:

In[4]:=
tspec = {30, 10};

Compute the controller:

In[5]:=
cd = ResourceFunction["QLearningLQRegulator"][sys, {q, r, g0}, tspec]
Out[5]=

The gain converges to ~0.1:

In[6]:=
ListStepPlot[Flatten[cd["GainValues"], 1], PlotRange -> All, DataRange -> cd["SimulationRange"]]
Out[6]=

The state is regulated to the origin:

In[7]:=
ListStepPlot[cd["StateValues"], PlotRange -> All, DataRange -> cd["SimulationRange"]]
Out[7]=

The input sequence that was used during the simulation:

In[8]:=
ListStepPlot[cd["InputValues"], PlotRange -> All, DataRange -> cd["SimulationRange"]]
Out[8]=

Scope (6) 

A system with one state and one input:

In[9]:=
sys = {Function[{x, u, k}, -0.9 x + 2 u], {1}};

Compute a controller:

In[10]:=
cd = ResourceFunction["QLearningLQRegulator"][sys, {({
     {1}
    }), ({
     {1}
    }), ({
     {0.1}
    })}, {30, 5}]
Out[10]=

The gain values:

In[11]:=
ListStepPlot[Flatten[cd["GainValues"], 1], PlotRange -> All, DataRange -> cd["SimulationRange"]]
Out[11]=

The state is regulated to the origin:

In[12]:=
ListStepPlot[cd["StateValues"][[1]], PlotRange -> All, DataRange -> cd["SimulationRange"]]
Out[12]=

The input sequence that was used during the simulation:

In[13]:=
ListStepPlot[cd["InputValues"][[1]]  , PlotRange -> All, DataRange -> cd["SimulationRange"]]
Out[13]=

A multi-state system:

In[14]:=
sys = {Function[{x, u, k}, Chop@{x[[2]], - 0.12 x[[1]] - 0.7 x[[2]] + u[[1]]}], RandomReal[{-10, 10}, 2]};

Compute a controller:

In[15]:=
cd = ResourceFunction["QLearningLQRegulator"][sys, {( {
     {1, 0},
     {0, 1}
    } ), \!\(\*
TagBox[
RowBox[{"(", "", GridBox[{
{"1"}
},
GridBoxAlignment->{"Columns" -> {{Center}}, "Rows" -> {{Baseline}}},
GridBoxSpacings->{"Columns" -> {
Offset[0.27999999999999997`], {
Offset[0.7]}, 
Offset[0.27999999999999997`]}, "Rows" -> {
Offset[0.2], {
Offset[0.4]}, 
Offset[0.2]}}], "", ")"}],
Function[BoxForm`e$, 
MatrixForm[BoxForm`e$]]]\), ( {
     {0, 0}
    } )}, {50, 10}]
Out[15]=

The gain values:

In[16]:=
ListStepPlot[Flatten[cd["GainValues"], 1], PlotRange -> All, DataRange -> cd["SimulationRange"]]
Out[16]=

The states are regulated to the origin:

In[17]:=
ListStepPlot[cd["StateValues"], PlotRange -> All, DataRange -> cd["SimulationRange"]]
Out[17]=

The input sequence that was used during the simulation:

In[18]:=
ListStepPlot[cd["InputValues"]  , PlotRange -> All, DataRange -> cd["SimulationRange"], PlotStyle -> ColorData[97, 3]]
Out[18]=

A multi-state, multi-input system:

In[19]:=
sys = {Function[{x, u, k}, {0.7 u[[1]] + 0.07 u[[2]] - 0.07 x[[2]], u[[1]] + x[[1]] - 0.8 x[[2]]}], RandomReal[{-10, 10}, 2]};

Compute a controller:

In[20]:=
cd = ResourceFunction["QLearningLQRegulator"][sys, {( {
     {1, 0},
     {0, 1}
    } ), ( {
     {1, 0},
     {0, 1}
    } ), ( {
     {0, 0},
     {0, 0}
    } )}, 90]
Out[20]=

The gain values:

In[21]:=
ListStepPlot[Flatten[cd["GainValues"], 1], PlotRange -> All, DataRange -> cd["SimulationRange"]]
Out[21]=

The states are regulated to the origin:

In[22]:=
ListStepPlot[cd["StateValues"], PlotRange -> All, DataRange -> cd["SimulationRange"]]
Out[22]=

The input sequence that was used during the simulation:

In[23]:=
ListStepPlot[cd["InputValues"]  , PlotRange -> All, DataRange -> cd["SimulationRange"]]
Out[23]=

A state-space model:

In[24]:=
sys = {StateSpaceModel[{{{0, 1.}, {-0.26, -1.}}, {{0}, {1.}}, {{0, 0.26}}, {{0}}}, SamplingPeriod -> 1, SystemsModelLabels -> None], RandomReal[{-1, 1}, 2]};

Compute a controller:

In[25]:=
cd = ResourceFunction["QLearningLQRegulator"][sys, {( {
     {1, 0},
     {0, 1}
    } ), \!\(\*
TagBox[
RowBox[{"(", "", GridBox[{
{"1"}
},
GridBoxAlignment->{"Columns" -> {{Center}}, "Rows" -> {{Baseline}}},
GridBoxSpacings->{"Columns" -> {
Offset[0.27999999999999997`], {
Offset[0.7]}, 
Offset[0.27999999999999997`]}, "Rows" -> {
Offset[0.2], {
Offset[0.4]}, 
Offset[0.2]}}], "", ")"}],
Function[BoxForm`e$, 
MatrixForm[BoxForm`e$]]]\), ( {
     {0, 0}
    } )}, 40]
Out[25]=

The gain values:

In[26]:=
ListStepPlot[Flatten[cd["GainValues"], 1], PlotRange -> All, DataRange -> cd["SimulationRange"]]
Out[26]=

The states are regulated to the origin:

In[27]:=
ListStepPlot[cd["StateValues"], PlotRange -> All, DataRange -> cd["SimulationRange"]]
Out[27]=

The input sequence that was used during the simulation:

In[28]:=
ListStepPlot[cd["InputValues"]  , PlotRange -> All, DataRange -> cd["SimulationRange"]]
Out[28]=

By default, a SystemsModelControllerData object is returned:

In[29]:=
cd = ResourceFunction["QLearningLQRegulator", ResourceVersion->"1.0.0"][{Function[{x, u, k}, -0.5 x + u], {3}}, {{{1}}, {{1}}, {{0}}}, 20]
Out[29]=

It can be used to obtain various properties:

In[30]:=
cd["Properties"]
Out[30]=

The value of a specific property:

In[31]:=
cd["BatchLength"]
Out[31]=

A list of property values:

In[32]:=
cd[{"BatchLength", "ConvergedQ"}]
Out[32]=

The values of all properties as an Association:

In[33]:=
cd["PropertyAssociation"]
Out[33]=

As a Dataset:

In[34]:=
cd["PropertyDataset"]
Out[34]=

Get a property directly:

In[35]:=
ResourceFunction["QLearningLQRegulator", ResourceVersion->"1.0.0"][{Function[{x, u, k}, -0.5 x + u], {3}}, {{{1}}, {{1}}, {{0}}}, 15, "Gain"]
Out[35]=

Applications (7) 

The model of the U.S Coast Guard cutter Tampa based on sea-trials data that gives the heading in response to rudder angle inputs:

In[36]:=
ssm = StateSpaceModel[
  TransferFunctionModel[{{{(-0.0184) (0.0068 + s)}}, s (0.0063 + s) (0.2647 + s)}, s], StateSpaceRealization -> "ObservableCompanion"]
Out[36]=

Discretize the model:

In[37]:=
ssmd = ToDiscreteTimeModel[ssm, 0.05, Method -> "ZeroOrderHold"]
Out[37]=

The model is marginally stable:

In[38]:=
Eigenvalues[Normal[ssmd][[1]]]
Out[38]=

A Q-learning LQ regulator starting with an initial heading of 5°:

In[39]:=
cd = ResourceFunction["QLearningLQRegulator", ResourceVersion->"1.0.0"][{ssmd, {0, 0, 5 °}}, {DiagonalMatrix[{1, 1, 10^5}], {{10^3}}, {{-50, 0, 0}}}, 400]
Out[39]=

The computed gain values:

In[40]:=
ListStepPlot[Flatten[cd["GainValues"], 1], PlotRange -> All, DataRange -> cd["SimulationRange"]]
Out[40]=

The heading is regulated back to the origin in about 20 seconds:

In[41]:=
ListStepPlot[cd["StateValues"][[3]]/Degree, PlotRange -> All, AxesOrigin -> {0, 0}, DataRange -> cd["SimulationRange"] 0.05]
Out[41]=

The rudder input values:

In[42]:=
ListStepPlot[cd["InputValues"]/Degree, PlotRange -> All, DataRange -> cd["SimulationRange"] 0.05]
Out[42]=

Properties and Relations (2) 

The gain computed by simulation converges to the optimal solution:

In[43]:=
ssm = StateSpaceModel[{a = {{0.8}}, b = {{1}}}, SamplingPeriod -> 1];
In[44]:=
{q, r} = {{{2}}, {{1}}};
In[45]:=
ResourceFunction["QLearningLQRegulator", ResourceVersion->"1.0.0"][{ssm, {-1}}, {q, r, {{-0.5}}}, {35, 5}, "GainValues"][[All, All, -3 ;; -1]]
Out[45]=

The optimal solution computed with knowledge of the system's dynamics:

In[46]:=
LQRegulatorGains[ssm, {q, r}]
Out[46]=

The above solution is computed using the discrete algebraic Riccati equation:

In[47]:=
With[{x = DiscreteRiccatiSolve[{a, b}, {q, r}]}, Inverse[b\[ConjugateTranspose] . x . b + r] . (b\[ConjugateTranspose] . x . a)]
Out[47]=

The gain computed by simulation converges to the optimal solution for a nonlinear system:

In[48]:=
nssm = NonlinearStateSpaceModel[{{-0.7 Sin[x] + x^2 + u Cos[x]}, {x}}, {x}, {u}, SamplingPeriod -> 1];
In[49]:=
{q, r} = {{{2}}, {{1}}};
In[50]:=
ResourceFunction["QLearningLQRegulator", ResourceVersion->"1.0.0"][{nssm, {-1}}, {q, r, {{-0.5}}}, {50, 5}, "GainValues"][[All, All, -3 ;; -1]]
Out[50]=

The optimal solution computed using the linearized model:

In[51]:=
LQRegulatorGains[nssm, {q, r}]
Out[51]=

The above solution is computed using the discrete algebraic Riccati equation:

In[52]:=
{a, b} = Normal[StateSpaceModel[nssm]][[1 ;; 2]];
In[53]:=
With[{x = DiscreteRiccatiSolve[{a, b}, {q, r}]}, Inverse[b\[ConjugateTranspose] . x . b + r] . (b\[ConjugateTranspose] . x . a)]
Out[53]=

Possible Issues (4) 

An unstable system may not converge to the optimal solution:

In[54]:=
sys = {Function[{x, u, k}, 3 x + u], {RandomReal[{-2, 2}]}};
{q, r} = {{{1}}, {{1}}};
tspec = 100;
In[55]:=
ResourceFunction["QLearningLQRegulator", ResourceVersion->"1.0.0"][sys, {q, r, {{-3}}}, tspec, {"Gain", "ConvergedQ"}]
Out[55]=

Adjusting the initial gain causes it to converge to the optimal solution:

In[56]:=
ResourceFunction["QLearningLQRegulator", ResourceVersion->"1.0.0"][sys, {q, r, {{3.5}}}, tspec, {"Gain", "ConvergedQ"}]
Out[56]=

The optimal solution:

In[57]:=
LQRegulatorGains[
  StateSpaceModel[{{{3}}, {{1}}}, SamplingPeriod -> 1], {q, r}] // N
Out[57]=

A system with a disturbance may not converge to the optimal solution:

In[58]:=
sys = {Function[{x, u, k}, -0.9 x + 2 u + 0.01 RandomReal[]], {1}};
{q, r} = {{{1}}, {{1}}};
tspec = 100;
In[59]:=
ResourceFunction["QLearningLQRegulator", ResourceVersion->"1.0.0"][sys, {q, r, {{0}}}, 500, {"Gain", "ConvergedQ"}]
Out[59]=

Adjusting the initial gain may cause it to come close to the optimal solution:

In[60]:=
ResourceFunction["QLearningLQRegulator", ResourceVersion->"1.0.0"][sys, {q, r, {{0.9}}}, 500, {"Gain", "ConvergedQ"}]
Out[60]=

The optimal solution:

In[61]:=
LQRegulatorGains[
 StateSpaceModel[{{{-0.9}}, {{2}}}, SamplingPeriod -> 1], {q, r}]
Out[61]=

The initial gain must be stabilizing:

In[62]:=
sys = {Function[{x, u, k}, -0.9 x + 2 u ], {1}};
{q, r} = {{{1}}, {{1}}};
tspec = 100;

Otherwise the state values will blow up:

In[63]:=
ResourceFunction["QLearningLQRegulator", ResourceVersion->"1.0.0"][sys, {q, r, {{2}}}, 30, "StateValues"]
Out[63]=

The state values with a stabilizing gain:

In[64]:=
ResourceFunction["QLearningLQRegulator"][sys, {q, r, {{0}}}, 50, "StateValues"] // Chop
Out[64]=

The system must be a discrete-time system:

In[65]:=
ResourceFunction["QLearningLQRegulator"][
 StateSpaceModel[{{{0}}, {{1}}}, SamplingPeriod -> None], {{{1}}, {{1}}, {{0}}}, 50]
Out[65]=
In[66]:=
ResourceFunction["QLearningLQRegulator"][
 StateSpaceModel[{{{0}}, {{1}}}, SamplingPeriod -> 1], {{{1}}, {{1}}, {{0}}}, 50]
Out[66]=

Publisher

Suba Thomas

Requirements

Wolfram Language 13.0 (December 2021) or above

Version History

  • 1.0.1 – 15 January 2025
  • 1.0.0 – 10 January 2025

Source Metadata

Related Resources

License Information