stable_learning_control.algos.tf2.policies.actors
Actor network structures.
Submodules
Classes
The squashed gaussian actor network. |
Package Contents
- class stable_learning_control.algos.tf2.policies.actors.SquashedGaussianActor(obs_dim, act_dim, hidden_sizes, activation=nn.relu, output_activation=nn.relu, act_limits=None, log_std_min=-20, log_std_max=2.0, name='gaussian_actor', **kwargs)[source]
Bases:
tf.keras.Model
The squashed gaussian actor network.
- net
The input/hidden layers of the network.
- Type:
- mu
The output layer which returns the mean of the actions.
- Type:
- log_std_layer
The output layer which returns the log standard deviation of the actions.
- Type:
- act_limits
The
high
andlow
action bounds of the environment. Used for rescaling the actions that comes out of network from(-1, 1)
to(low, high)
. No scaling will be applied if left empty.- Type:
dict, optional
Initialise the SquashedGaussianActor object.
- Parameters:
obs_dim (int) – Dimension of the observation space.
act_dim (int) – Dimension of the action space.
hidden_sizes (list) – Sizes of the hidden layers.
activation (
tf.keras.activations
) – The activation function. Defaults totf.nn.relu
.output_activation (
tf.keras.activations
, optional) – The activation function used for the output layers. Defaults totf.nn.relu
.act_limits (dict) – The
high
andlow
action bounds of the environment. Used for rescaling the actions that comes out of network from(-1, 1)
to(low, high)
.log_std_min (int, optional) – The minimum log standard deviation. Defaults to
-20
.log_std_max (float, optional) – The maximum log standard deviation. Defaults to
2.0
.name (str, optional) – The Lyapunov critic name. Defaults to
gaussian_actor
.**kwargs – All kwargs to pass to the
tf.keras.Model
. Can be used to add additional inputs or outputs.
- act_limits
- _log_std_min
- _log_std_max
- _squash_bijector
- _normal_distribution
- net
- mu_layer
- log_std_layer
- call(obs, deterministic=False, with_logprob=True)[source]
Perform forward pass through the network.
- Parameters:
obs (numpy.ndarray) – The tensor of observations.
deterministic (bool, optional) – Whether we want to use a deterministic policy (used at test time). When true the mean action of the stochastic policy is returned. If
False
the action is sampled from the stochastic policy. Defaults toFalse
.with_logprob (bool, optional) – Whether we want to return the log probability of an action. Defaults to
True
.
- Returns:
tuple containing:
- Return type:
(tuple)
- act(obs, deterministic=False)[source]
Returns the action from the current state given the current policy.
- Parameters:
obs (numpy.ndarray) – The current observation (state).
deterministic (bool, optional) – Whether we want to use a deterministic policy (used at test time). When true the mean action of the stochastic policy is returned. If
False
the action is sampled from the stochastic policy. Defaults toFalse
.
- Returns:
The action from the current state given the current policy.
- Return type:
- get_action(obs, deterministic=False)[source]
Simple wrapper for making the
act()
method available under the ‘get_action’ alias.- Parameters:
obs (numpy.ndarray) – The current observation (state).
deterministic (bool, optional) – Whether we want to use a deterministic policy (used at test time). When true the mean action of the stochastic policy is returned. If
False
the action is sampled from the stochastic policy. Defaults toFalse
.
- Returns:
The action from the current state given the current policy.
- Return type: