stable_learning_control.algos.tf2.policies.lyapunov_actor_twin_critic

Lyapunov (soft) Actor-Twin critic policy.

This module contains a modified Pytorch implementation of the Lyapunov Actor-Critic policy of Han et al. 2020. Like the original SAC algorithm, this LAC variant uses two critics instead of one to mitigate a possible underestimation bias, while the original LAC only uses one critic.

Attributes

HIDDEN_SIZES_DEFAULT

ACTIVATION_DEFAULT

OUTPUT_ACTIVATION_DEFAULT

Classes

LyapunovActorTwinCritic

Lyapunov (soft) Actor-Twin Critic network.

Module Contents

stable_learning_control.algos.tf2.policies.lyapunov_actor_twin_critic.HIDDEN_SIZES_DEFAULT[source]
stable_learning_control.algos.tf2.policies.lyapunov_actor_twin_critic.ACTIVATION_DEFAULT[source]
stable_learning_control.algos.tf2.policies.lyapunov_actor_twin_critic.OUTPUT_ACTIVATION_DEFAULT[source]
class stable_learning_control.algos.tf2.policies.lyapunov_actor_twin_critic.LyapunovActorTwinCritic(observation_space, action_space, hidden_sizes=HIDDEN_SIZES_DEFAULT, activation=ACTIVATION_DEFAULT, output_activation=OUTPUT_ACTIVATION_DEFAULT, name='lyapunov_actor_critic')[source]

Bases: tf.keras.Model

Lyapunov (soft) Actor-Twin Critic network.

self.pi

The squashed gaussian policy network (actor).

Type:

SquashedGaussianActor

self.L

The soft L-network (critic).

Type:

LCritic

self.L2

The second soft L-network (critic).

Type:

LCritic

Initialise the LyapunovActorTwinCritic object.

Parameters:
  • observation_space (gym.space.box.Box) – A gymnasium observation space.

  • action_space (gym.space.box.Box) – A gymnasium action space.

  • hidden_sizes (Union[dict, tuple, list], optional) – Sizes of the hidden layers for the actor. Defaults to (256, 256).

  • activation (Union[dict, tf.keras.activations], optional) – The (actor and critic) hidden layers activation function. Defaults to tf.nn.relu.

  • output_activation (Union[dict, tf.keras.activations], optional) – The actor output activation function. Defaults to tf.nn.relu.

  • name (str, optional) – The name given to the LyapunovActorCritic. Defaults to “lyapunov_actor_critic”.

Note

It is currently not possible to set the critic output activation function when using the LyapunovActorTwinCritic. This is since it by design requires the critic output activation to by of type tf.math.square().

obs_dim[source]
act_dim[source]
act_limits[source]
pi[source]
L[source]
L2[source]
obs_dummy[source]
act_dummy[source]
call(inputs, deterministic=False, with_logprob=True)[source]

Performs a forward pass through all the networks (Actor and both L critics).

Parameters:
  • inputs (tuple) –

    tuple containing:

    • obs (tf.Tensor): The tensor of observations.

    • act (tf.Tensor): The tensor of actions.

  • deterministic (bool, optional) – Whether we want to use a deterministic policy (used at test time). When true the mean action of the stochastic policy is returned. If false the action is sampled from the stochastic policy. Defaults to False.

  • with_logprob (bool, optional) – Whether we want to return the log probability of an action. Defaults to True.

Returns:

tuple containing:

  • pi_action (tf.Tensor): The actions given by the policy.

  • logp_pi (tf.Tensor): The log probabilities of each of these actions.

  • L (tf.Tensor): First critic L values.

  • L2 (tf.Tensor): Second critic L values.

Return type:

(tuple)

Note

Useful for when you want to print out the full network graph using TensorBoard.

act(obs, deterministic=False)[source]

Returns the action from the current state given the current policy.

Parameters:
  • obs (numpy.ndarray) – The current observation (state).

  • deterministic (bool, optional) – Whether we want to use a deterministic policy (used at test time). When true the mean action of the stochastic policy is returned. If False the action is sampled from the stochastic policy. Defaults to False.

Returns:

The action from the current state given the current policy.

Return type:

numpy.ndarray