stable_learning_control.algos.tf2.policies
Policies and networks used to create the RL agents.
Subpackages
Submodules
Classes
The squashed gaussian actor network. |
|
Soft Lyapunov critic Network. |
|
Soft Q critic network. |
|
Lyapunov (soft) Actor-Critic network. |
|
Lyapunov (soft) Actor-Twin Critic network. |
|
Soft Actor-Critic network. |
Package Contents
- class stable_learning_control.algos.tf2.policies.SquashedGaussianActor(obs_dim, act_dim, hidden_sizes, activation=nn.relu, output_activation=nn.relu, act_limits=None, log_std_min=-20, log_std_max=2.0, name='gaussian_actor', **kwargs)[source]
Bases:
tf.keras.Model
The squashed gaussian actor network.
- net
The input/hidden layers of the network.
- Type:
- mu
The output layer which returns the mean of the actions.
- Type:
- log_std_layer
The output layer which returns the log standard deviation of the actions.
- Type:
- act_limits
The
high
andlow
action bounds of the environment. Used for rescaling the actions that comes out of network from(-1, 1)
to(low, high)
. No scaling will be applied if left empty.- Type:
dict, optional
Initialise the SquashedGaussianActor object.
- Parameters:
obs_dim (int) – Dimension of the observation space.
act_dim (int) – Dimension of the action space.
hidden_sizes (list) – Sizes of the hidden layers.
activation (
tf.keras.activations
) – The activation function. Defaults totf.nn.relu
.output_activation (
tf.keras.activations
, optional) – The activation function used for the output layers. Defaults totf.nn.relu
.act_limits (dict) – The
high
andlow
action bounds of the environment. Used for rescaling the actions that comes out of network from(-1, 1)
to(low, high)
.log_std_min (int, optional) – The minimum log standard deviation. Defaults to
-20
.log_std_max (float, optional) – The maximum log standard deviation. Defaults to
2.0
.name (str, optional) – The Lyapunov critic name. Defaults to
gaussian_actor
.**kwargs – All kwargs to pass to the
tf.keras.Model
. Can be used to add additional inputs or outputs.
- act_limits
- _log_std_min
- _log_std_max
- _squash_bijector
- _normal_distribution
- net
- mu_layer
- log_std_layer
- call(obs, deterministic=False, with_logprob=True)[source]
Perform forward pass through the network.
- Parameters:
obs (numpy.ndarray) – The tensor of observations.
deterministic (bool, optional) – Whether we want to use a deterministic policy (used at test time). When true the mean action of the stochastic policy is returned. If
False
the action is sampled from the stochastic policy. Defaults toFalse
.with_logprob (bool, optional) – Whether we want to return the log probability of an action. Defaults to
True
.
- Returns:
tuple containing:
- Return type:
(tuple)
- act(obs, deterministic=False)[source]
Returns the action from the current state given the current policy.
- Parameters:
obs (numpy.ndarray) – The current observation (state).
deterministic (bool, optional) – Whether we want to use a deterministic policy (used at test time). When true the mean action of the stochastic policy is returned. If
False
the action is sampled from the stochastic policy. Defaults toFalse
.
- Returns:
The action from the current state given the current policy.
- Return type:
- get_action(obs, deterministic=False)[source]
Simple wrapper for making the
act()
method available under the ‘get_action’ alias.- Parameters:
obs (numpy.ndarray) – The current observation (state).
deterministic (bool, optional) – Whether we want to use a deterministic policy (used at test time). When true the mean action of the stochastic policy is returned. If
False
the action is sampled from the stochastic policy. Defaults toFalse
.
- Returns:
The action from the current state given the current policy.
- Return type:
- class stable_learning_control.algos.tf2.policies.LCritic(obs_dim, act_dim, hidden_sizes, activation=nn.relu, name='lyapunov_critic', **kwargs)[source]
Bases:
tf.keras.Model
Soft Lyapunov critic Network.
- L
The layers of the network.
- Type:
Initialise the LCritic object.
- Parameters:
obs_dim (int) – Dimension of the observation space.
act_dim (int) – Dimension of the action space.
hidden_sizes (list) – Sizes of the hidden layers.
activation (
tf.keras.activations
, optional) – The activation function. Defaults totf.nn.relu
.name (str, optional) – The Lyapunov critic name. Defaults to
lyapunov_critic
.**kwargs – All kwargs to pass to the
tf.keras.Model
. Can be used to add additional inputs or outputs.
- L
- class stable_learning_control.algos.tf2.policies.QCritic(obs_dim, act_dim, hidden_sizes, activation=nn.relu, output_activation=None, name='q_critic', **kwargs)[source]
Bases:
tf.keras.Model
Soft Q critic network.
- Q
The layers of the network.
- Type:
Initialise the QCritic object.
- Parameters:
obs_dim (int) – Dimension of the observation space.
act_dim (int) – Dimension of the action space.
hidden_sizes (list) – Sizes of the hidden layers.
activation (
tf.keras.activations
, optional) – The activation function. Defaults totf.nn.relu
.output_activation (
tf.keras.activations
, optional) – The activation function used for the output layers. Defaults toNone
which is equivalent to using the Identity activation function.name (str, optional) – The Lyapunov critic name. Defaults to
q_critic
.**kwargs – All kwargs to pass to the
tf.keras.Model
. Can be used to add additional inputs or outputs.
- Q
- class stable_learning_control.algos.tf2.policies.LyapunovActorCritic(observation_space, action_space, hidden_sizes=HIDDEN_SIZES_DEFAULT, activation=ACTIVATION_DEFAULT, output_activation=OUTPUT_ACTIVATION_DEFAULT, name='lyapunov_actor_critic')[source]
Bases:
tf.keras.Model
Lyapunov (soft) Actor-Critic network.
- self.pi
The squashed gaussian policy network (actor).
- Type:
Initialise the LyapunovActorCritic object.
- Parameters:
observation_space (
gym.space.box.Box
) – A gymnasium observation space.action_space (
gym.space.box.Box
) – A gymnasium action space.hidden_sizes (Union[dict, tuple, list], optional) – Sizes of the hidden layers for the actor. Defaults to
(256, 256)
.activation (Union[
dict
,tf.keras.activations
], optional) – The (actor and critic) hidden layers activation function. Defaults totf.nn.relu
.output_activation (Union[
dict
,tf.keras.activations
], optional) – The actor output activation function. Defaults totf.nn.relu
.name (str, optional) – The name given to the LyapunovActorCritic. Defaults to “lyapunov_actor_critic”.
Note
It is currently not possible to set the critic output activation function when using the LyapunovActorCritic. This is since it by design requires the critic output activation to by of type
tf.math.square()
.- obs_dim
- act_dim
- act_limits
- pi
- L
- obs_dummy
- act_dummy
- call(inputs, deterministic=False, with_logprob=True)[source]
Performs a forward pass through all the networks (Actor and L critic).
- Parameters:
inputs (tuple) –
tuple containing:
obs (tf.Tensor): The tensor of observations.
act (tf.Tensor): The tensor of actions.
deterministic (bool, optional) – Whether we want to use a deterministic policy (used at test time). When true the mean action of the stochastic policy is returned. If false the action is sampled from the stochastic policy. Defaults to
False
.with_logprob (bool, optional) – Whether we want to return the log probability of an action. Defaults to
True
.
- Returns:
tuple containing:
- Return type:
(tuple)
Note
Useful for when you want to print out the full network graph using TensorBoard.
- act(obs, deterministic=False)[source]
Returns the action from the current state given the current policy.
- Parameters:
obs (numpy.ndarray) – The current observation (state).
deterministic (bool, optional) – Whether we want to use a deterministic policy (used at test time). When true the mean action of the stochastic policy is returned. If
False
the action is sampled from the stochastic policy. Defaults toFalse
.
- Returns:
The action from the current state given the current policy.
- Return type:
- class stable_learning_control.algos.tf2.policies.LyapunovActorTwinCritic(observation_space, action_space, hidden_sizes=HIDDEN_SIZES_DEFAULT, activation=ACTIVATION_DEFAULT, output_activation=OUTPUT_ACTIVATION_DEFAULT, name='lyapunov_actor_critic')[source]
Bases:
tf.keras.Model
Lyapunov (soft) Actor-Twin Critic network.
- self.pi
The squashed gaussian policy network (actor).
- Type:
Initialise the LyapunovActorTwinCritic object.
- Parameters:
observation_space (
gym.space.box.Box
) – A gymnasium observation space.action_space (
gym.space.box.Box
) – A gymnasium action space.hidden_sizes (Union[dict, tuple, list], optional) – Sizes of the hidden layers for the actor. Defaults to
(256, 256)
.activation (Union[
dict
,tf.keras.activations
], optional) – The (actor and critic) hidden layers activation function. Defaults totf.nn.relu
.output_activation (Union[
dict
,tf.keras.activations
], optional) – The actor output activation function. Defaults totf.nn.relu
.name (str, optional) – The name given to the LyapunovActorCritic. Defaults to “lyapunov_actor_critic”.
Note
It is currently not possible to set the critic output activation function when using the LyapunovActorTwinCritic. This is since it by design requires the critic output activation to by of type
tf.math.square()
.- obs_dim
- act_dim
- act_limits
- pi
- L
- L2
- obs_dummy
- act_dummy
- call(inputs, deterministic=False, with_logprob=True)[source]
Performs a forward pass through all the networks (Actor and both L critics).
- Parameters:
inputs (tuple) –
tuple containing:
obs (tf.Tensor): The tensor of observations.
act (tf.Tensor): The tensor of actions.
deterministic (bool, optional) – Whether we want to use a deterministic policy (used at test time). When true the mean action of the stochastic policy is returned. If false the action is sampled from the stochastic policy. Defaults to
False
.with_logprob (bool, optional) – Whether we want to return the log probability of an action. Defaults to
True
.
- Returns:
tuple containing:
- Return type:
(tuple)
Note
Useful for when you want to print out the full network graph using TensorBoard.
- act(obs, deterministic=False)[source]
Returns the action from the current state given the current policy.
- Parameters:
obs (numpy.ndarray) – The current observation (state).
deterministic (bool, optional) – Whether we want to use a deterministic policy (used at test time). When true the mean action of the stochastic policy is returned. If
False
the action is sampled from the stochastic policy. Defaults toFalse
.
- Returns:
The action from the current state given the current policy.
- Return type:
- class stable_learning_control.algos.tf2.policies.SoftActorCritic(observation_space, action_space, hidden_sizes=HIDDEN_SIZES_DEFAULT, activation=ACTIVATION_DEFAULT, output_activation=OUTPUT_ACTIVATION_DEFAULT, name='soft_actor_critic')[source]
Bases:
tf.keras.Model
Soft Actor-Critic network.
- self.pi
The squashed gaussian policy network (actor).
- Type:
Initialise the SoftActorCritic object.
- Parameters:
observation_space (
gym.space.box.Box
) – A gymnasium observation space.action_space (
gym.space.box.Box
) – A gymnasium action space.hidden_sizes (Union[dict, tuple, list], optional) – Sizes of the hidden layers for the actor. Defaults to
(256, 256)
.activation (Union[
dict
,tf.keras.activations
], optional) – The (actor and critic) hidden layers activation function. Defaults totf.nn.relu
.output_activation (Union[
dict
,tf.keras.activations
], optional) – The (actor and critic) output activation function. Defaults totf.nn.relu
for the actor and the Identity function for the critic.name (str, optional) – The name given to the SoftActorCritic. Defaults to “soft_actor_critic”.
- obs_dim
- act_dim
- act_limits
- pi
- Q1
- Q2
- obs_dummy
- act_dummy
- call(inputs, deterministic=False, with_logprob=True)[source]
Performs a forward pass through all the networks (Actor, Q critic 1 and Q critic 2).
- Parameters:
inputs (tuple) –
tuple containing:
obs (tf.Tensor): The tensor of observations.
act (tf.Tensor): The tensor of actions.
deterministic (bool, optional) – Whether we want to use a deterministic policy (used at test time). When true the mean action of the stochastic policy is returned. If false the action is sampled from the stochastic policy. Defaults to
False
.with_logprob (bool, optional) – Whether we want to return the log probability of an action. Defaults to
True
.
- Returns:
tuple containing:
- Return type:
(tuple)
Note
Useful for when you want to print out the full network graph using TensorBoard.