Lyapunov Actor-Twin Critic (LATC)
See also
This document assumes you are familiar with the Lyapunov Actor-Critic (LAC) algorithm. It is not a comprehensive guide but mainly depicts the difference between the Lyapunov Actor-Twin Critic and Lyapunov Actor-Critic (LAC) algorithms. It is therefore meant to complement the LAC algorithm documentation.
Important
Like the LAC algorithm, this LATC algorithm only guarantees stability in mean cost when trained on
environments with a positive definite cost function (i.e. environments in which the cost is minimised).
The opt_type
argument can be set to maximise
when training in environments where the reward is
maximised. However, because the Lyapunov’s stability conditions are not satisfied, the LAC algorithm
no longer guarantees stability in mean cost.
Background
The Laypunov Actor-Twin Critic (LATC) algorithm is a successor to the LAC algorithm. In contrast to its predecessor, the LATC algorithm employs a dual-critic approach, aligning it more closely with the SAC algorithm upon which LAC was built initially. In the SAC framework, these dual critics served to counteract overestimation bias by selecting the minimum value from both critics for the actor updates. In our case, we employ the maximum to minimise the cost, thus addressing potential underestimation bias in Lyapunov values. For a deeper exploration of this concept, refer to the research paper by Haarnoja et. al 2019. For more information on the inner workings of the LAC algorithm, refer to the LAC algorithm documentation. Below only the differences between the LAC and LATC algorithms are discussed.
Differences with the LAC algorithm
Like its direct predecessor, the LATC algorithm also uses entropy regularisation to increase exploration and a
Gaussian actor and value-critic to develop the best action. The main difference lies in the fact that the
LyapunovActorTwinCritic
contains two critic instead of one. These critics are identical to the critic used in the LAC
algorithm but trained separately. Following their maximum is used to update the actor. Because of this the policy issues
optimised according to
(1)
Where now represents the maximum of the two critics. The rest of the algorithm remains the same.
Important
Because the LATC and LAC algorithms are so similar, the latc()
is
implemented as a wrapper around the lac()
function. This wrapper
only changes the actor-critic architecture to LyapunovActorTwinCritic
.
To prevent code duplication, the stable_learning_control.algos.pytorch.policies.lyapunov_actor_critic.LyapunovActorCritic
class
was modified to use the maximum of the two critics when the LyapunovActorTwinCritic
class is set as the actor-critic architecture.
Quick Fact
LATC is an off-policy algorithm.
It is guaranteed to be stable in mean cost.
The version of LATC implemented here can only be used for environments with continuous action spaces.
An alternate version of LATC, which slightly changes the policy update rule, can be implemented to handle discrete action spaces.
The SLC implementation of LATC does not support parallelisation.
Further Reading
For more information on the LATC algorithm, please check out the LAC documentation and the original paper of Han et al., 2020.
Pseudocode
Implementation
You Should Know
In what follows, we give documentation for the PyTorch and TensorFlow implementations of LATC in SLC. They have nearly identical function calls and docstrings, except for details relating to model construction. However, we include both full docstrings for completeness.
Algorithm: PyTorch Version
- stable_learning_control.algos.pytorch.latc.latc(env_fn, actor_critic=None, *args, **kwargs)[source]
Trains the LATC algorithm in a given environment.
- Parameters:
env_fn – A function which creates a copy of the environment. The environment must satisfy the gymnasium API.
actor_critic (torch.nn.Module, optional) –
The constructor method for a Torch Module with an
act
method, api
module and severalQ
orL
modules. Theact
method andpi
module should accept batches of observations as inputs, and theQ*
andL
modules should accept a batch of observations and a batch of actions as inputs. When called, these modules should return:Call
Output Shape
Description
act
(batch, act_dim)
Numpy array of actions for eachobservation.Q*/L
(batch,)
Tensor containing one current estimateofQ*/L
for the providedobservations and actions. (Critical:make sure to flatten this!)Calling
pi
should return:Symbol
Shape
Description
a
(batch, act_dim)
Tensor containing actions from policygiven observations.logp_pi
(batch,)
Tensor containing log probabilities ofactions ina
. Importantly:gradients should be able to flow backintoa
.Defaults to
LyapunovActorTwinCritic
*args – The positional arguments to pass to the
lac()
method.**kwargs – The keyword arguments to pass to the
lac()
method.
Note
Wraps the
lac()
function so that theLyapunovActorTwinCritic
architecture is used as the actor critic.
Saved Model Contents: PyTorch Version
The PyTorch version of the LATC algorithm is implemented by subclassing the torch.nn.Module
class. Because of this and because
the LATC algorithm is implemented as a wrapper around the LAC algorithm; the model weights are saved using the model_state
dictionary ( state_dict
).
These saved weights can be found in the torch_save/model_state.pt
file. For an example of how to load a model using
this file, see Experiment Outputs or the PyTorch documentation.
Algorithm: TensorFlow Version
Attention
The TensorFlow version is still experimental. It is not guaranteed to work, and it is not guaranteed to be up-to-date with the PyTorch version.
- stable_learning_control.algos.tf2.latc.latc(env_fn, actor_critic=None, *args, **kwargs)[source]
Trains the LATC algorithm in a given environment.
- Parameters:
env_fn – A function which creates a copy of the environment. The environment must satisfy the gymnasium API.
actor_critic (tf.Module, optional) –
The constructor method for a TensorFlow Module with an
act
method, api
module and severalQ
orL
modules. Theact
method andpi
module should accept batches of observations as inputs, and theQ*
andL
modules should accept a batch of observations and a batch of actions as inputs. When called, these modules should return:Call
Output Shape
Description
act
(batch, act_dim)
Numpy array of actions for eachobservation.Q*/L
(batch,)
Tensor containing one current estimateofQ*/L
for the providedobservations and actions. (Critical:make sure to flatten this!)Calling
pi
should return:Symbol
Shape
Description
a
(batch, act_dim)
Tensor containing actions from policygiven observations.logp_pi
(batch,)
Tensor containing log probabilities ofactions ina
. Importantly:gradients should be able to flow backintoa
.Defaults to
LyapunovActorTwinCritic
*args – The positional arguments to pass to the
lac()
method.**kwargs – The keyword arguments to pass to the
lac()
method.
Note
Wraps the
lac()
function so that theLyapunovActorTwinCritic
architecture is used as the actor critic.
Saved Model Contents: TensorFlow Version
The TensorFlow version of the LATC algorithm is implemented by subclassing the tf.nn.Model
class. As a result, both the
full model and the current model weights are saved. The complete model can be found in the saved_model.pb
file, while the current
weights checkpoints are found in the tf_safe/weights_checkpoint*
file. For an example of using these two methods, see Experiment Outputs
or the TensorFlow documentation.