stable_learning_control.algos.tf2.policies.actors.squashed_gaussian_actor

Squashed Gaussian Actor policy.

This module contains a TensorFlow 2.x implementation of the Squashed Gaussian Actor policy of Haarnoja et al. 2019.

Module Contents

Classes

SquashedGaussianActor

The squashed gaussian actor network.

class stable_learning_control.algos.tf2.policies.actors.squashed_gaussian_actor.SquashedGaussianActor(obs_dim, act_dim, hidden_sizes, activation=nn.relu, output_activation=nn.relu, act_limits=None, log_std_min=-20, log_std_max=2.0, name='gaussian_actor', **kwargs)[source]

Bases: tf.keras.Model

The squashed gaussian actor network.

net

The input/hidden layers of the network.

Type:

tf.keras.Sequential

mu

The output layer which returns the mean of the actions.

Type:

tf.keras.Sequential

log_std_layer

The output layer which returns the log standard deviation of the actions.

Type:

tf.keras.Sequential

act_limits

The high and low action bounds of the environment. Used for rescaling the actions that comes out of network from (-1, 1) to (low, high). No scaling will be applied if left empty.

Type:

dict, optional

Initialise the SquashedGaussianActor object.

Parameters:
  • obs_dim (int) – Dimension of the observation space.

  • act_dim (int) – Dimension of the action space.

  • hidden_sizes (list) – Sizes of the hidden layers.

  • activation (tf.keras.activations) – The activation function. Defaults to tf.nn.relu.

  • output_activation (tf.keras.activations, optional) – The activation function used for the output layers. Defaults to tf.nn.relu.

  • act_limits (dict) – The high and low action bounds of the environment. Used for rescaling the actions that comes out of network from (-1, 1) to (low, high).

  • log_std_min (int, optional) – The minimum log standard deviation. Defaults to -20.

  • log_std_max (float, optional) – The maximum log standard deviation. Defaults to 2.0.

  • name (str, optional) – The Lyapunov critic name. Defaults to gaussian_actor.

  • **kwargs – All kwargs to pass to the tf.keras.Model. Can be used to add additional inputs or outputs.

call(obs, deterministic=False, with_logprob=True)[source]

Perform forward pass through the network.

Parameters:
  • obs (numpy.ndarray) – The tensor of observations.

  • deterministic (bool, optional) – Whether we want to use a deterministic policy (used at test time). When true the mean action of the stochastic policy is returned. If False the action is sampled from the stochastic policy. Defaults to False.

  • with_logprob (bool, optional) – Whether we want to return the log probability of an action. Defaults to True.

Returns:

tuple containing:

  • pi_action (tf.Tensor): The actions given by the policy.

  • logp_pi (tf.Tensor): The log probabilities of each of these actions.

Return type:

(tuple)

act(obs, deterministic=False)[source]

Returns the action from the current state given the current policy.

Parameters:
  • obs (numpy.ndarray) – The current observation (state).

  • deterministic (bool, optional) – Whether we want to use a deterministic policy (used at test time). When true the mean action of the stochastic policy is returned. If False the action is sampled from the stochastic policy. Defaults to False.

Returns:

The action from the current state given the current policy.

Return type:

numpy.ndarray

get_action(obs, deterministic=False)[source]

Simple wrapper for making the act() method available under the ‘get_action’ alias.

Parameters:
  • obs (numpy.ndarray) – The current observation (state).

  • deterministic (bool, optional) – Whether we want to use a deterministic policy (used at test time). When true the mean action of the stochastic policy is returned. If False the action is sampled from the stochastic policy. Defaults to False.

Returns:

The action from the current state given the current policy.

Return type:

numpy.ndarray