stable_learning_control.algos.pytorch.policies

Policies and networks used to create the RL agents.

Subpackages

Submodules

Classes

SquashedGaussianActor

The squashed gaussian actor network.

LCritic

Soft Lyapunov critic Network.

QCritic

Soft Q critic network.

LyapunovActorCritic

Lyapunov (soft) Actor-Critic network.

LyapunovActorTwinCritic

Lyapunov (soft) Actor-(twin Critic) network.

SoftActorCritic

Soft Actor-Critic network.

Package Contents

class stable_learning_control.algos.pytorch.policies.SquashedGaussianActor(obs_dim, act_dim, hidden_sizes, activation=nn.ReLU, output_activation=nn.ReLU, act_limits=None, log_std_min=-20, log_std_max=2.0)[source]

Bases: torch.nn.Module

The squashed gaussian actor network.

net

The input/hidden layers of the network.

Type:

torch.nn.Sequential

mu

The output layer which returns the mean of the actions.

Type:

torch.nn.Linear

log_std_layer

The output layer which returns the log standard deviation of the actions.

Type:

torch.nn.Linear

act_limits

The high and low action bounds of the environment. Used for rescaling the actions that comes out of network from (-1, 1) to (low, high). No scaling will be applied if left empty.

Type:

dict, optional

Initialise the SquashedGaussianActor object.

Parameters:
  • obs_dim (int) – Dimension of the observation space.

  • act_dim (int) – Dimension of the action space.

  • hidden_sizes (list) – Sizes of the hidden layers.

  • activation (torch.nn.modules.activation) – The activation function. Defaults to torch.nn.ReLU.

  • output_activation (torch.nn.modules.activation, optional) – The activation function used for the output layers. Defaults to torch.nn.ReLU.

  • act_limits (dict) – The high and low action bounds of the environment. Used for rescaling the actions that comes out of network from (-1, 1) to (low, high).

  • log_std_min (int, optional) – The minimum log standard deviation. Defaults to -20.

  • log_std_max (float, optional) – The maximum log standard deviation. Defaults to 2.0.

__device_warning_logged = False
act_limits
_log_std_min
_log_std_max
net
mu_layer
log_std_layer
forward(obs, deterministic=False, with_logprob=True)[source]

Perform forward pass through the network.

Parameters:
  • obs (torch.Tensor) – The tensor of observations.

  • deterministic (bool, optional) – Whether we want to use a deterministic policy (used at test time). When true the mean action of the stochastic policy is returned. If false the action is sampled from the stochastic policy. Defaults to False.

  • with_logprob (bool, optional) – Whether we want to return the log probability of an action. Defaults to True.

Returns:

tuple containing:

  • pi_action (torch.Tensor): The actions given by the policy.

  • logp_pi (torch.Tensor): The log probabilities of each of these actions.

Return type:

(tuple)

act(obs, deterministic=False)[source]

Returns the action from the current state given the current policy.

Parameters:
  • obs (torch.Tensor) – The current observation (state).

  • deterministic (bool, optional) – Whether we want to use a deterministic policy (used at test time). When true the mean action of the stochastic policy is returned. If False the action is sampled from the stochastic policy. Defaults to False.

Returns:

The action from the current state given the current policy.

Return type:

numpy.ndarray

get_action(obs, deterministic=False)[source]

Simple warpper for making the act() method available under the ‘get_action’ alias.

Parameters:
  • obs (torch.Tensor) – The current observation (state).

  • deterministic (bool, optional) – Whether we want to use a deterministic policy (used at test time). When true the mean action of the stochastic policy is returned. If False the action is sampled from the stochastic policy. Defaults to False.

Returns:

The action from the current state given the current

policy.

Return type:

numpy.ndarray

class stable_learning_control.algos.pytorch.policies.LCritic(obs_dim, act_dim, hidden_sizes, activation=nn.ReLU)[source]

Bases: torch.nn.Module

Soft Lyapunov critic Network.

L

The layers of the network.

Type:

torch.nn.Sequential

Initialise the LCritic object.

Parameters:
  • obs_dim (int) – Dimension of the observation space.

  • act_dim (int) – Dimension of the action space.

  • hidden_sizes (list) – Sizes of the hidden layers.

  • activation (torch.nn.modules.activation, optional) – The activation function. Defaults to torch.nn.ReLU.

__device_warning_logged = False
_obs_same_device = False
_act_same_device = False
L
forward(obs, act)[source]

Perform forward pass through the network.

Parameters:
Returns:

The tensor containing the lyapunov values of the input observations and actions.

Return type:

torch.Tensor

class stable_learning_control.algos.pytorch.policies.QCritic(obs_dim, act_dim, hidden_sizes, activation=nn.ReLU, output_activation=nn.Identity)[source]

Bases: torch.nn.Module

Soft Q critic network.

Q

The layers of the network.

Type:

torch.nn.Sequential

Initialise the QCritic object.

Parameters:
  • obs_dim (int) – Dimension of the observation space.

  • act_dim (int) – Dimension of the action space.

  • hidden_sizes (list) – Sizes of the hidden layers.

  • activation (torch.nn.modules.activation, optional) – The activation function. Defaults to torch.nn.ReLU.

  • output_activation (torch.nn.modules.activation, optional) – The activation function used for the output layers. Defaults to torch.nn.Identity.

__device_warning_logged = False
_obs_same_device = False
_act_same_device = False
Q
forward(obs, act)[source]

Perform forward pass through the network.

Parameters:
Returns:

The tensor containing the Q values of the input observations and actions.

Return type:

torch.Tensor

class stable_learning_control.algos.pytorch.policies.LyapunovActorCritic(observation_space, action_space, hidden_sizes=HIDDEN_SIZES_DEFAULT, activation=ACTIVATION_DEFAULT, output_activation=OUTPUT_ACTIVATION_DEFAULT)[source]

Bases: torch.nn.Module

Lyapunov (soft) Actor-Critic network.

self.pi

The squashed gaussian policy network (actor).

Type:

SquashedGaussianActor

self.L

The soft L-network (critic).

Type:

LCritic

Initialise the LyapunovActorCritic object.

Parameters:
  • observation_space (gym.space.box.Box) – A gymnasium observation space.

  • action_space (gym.space.box.Box) – A gymnasium action space.

  • hidden_sizes (Union[dict, tuple, list], optional) – Sizes of the hidden layers for the actor. Defaults to (256, 256).

  • activation (Union[dict, torch.nn.modules.activation], optional) – The (actor and critic) hidden layers activation function. Defaults to torch.nn.ReLU.

  • output_activation (Union[dict, torch.nn.modules.activation], optional) – The (actor and critic) output activation function. Defaults to torch.nn.ReLU for the actor and nn.Identity for the critic.

Note

It is currently not possible to set the critic output activation function when using the LyapunovActorCritic. This is since it by design requires the critic output activation to by of type torch.square().

obs_dim
act_dim
act_limits
pi
L
forward(obs, act, deterministic=False, with_logprob=True)[source]

Performs a forward pass through all the networks (Actor and L critic).

Parameters:
  • obs (torch.Tensor) – The tensor of observations.

  • act (torch.Tensor) – The tensor of actions.

  • deterministic (bool, optional) – Whether we want to use a deterministic policy (used at test time). When true the mean action of the stochastic policy is returned. If false the action is sampled from the stochastic policy. Defaults to False.

  • with_logprob (bool, optional) – Whether we want to return the log probability of an action. Defaults to True.

Returns:

tuple containing:

Return type:

(tuple)

Note

Useful for when you want to print out the full network graph using TensorBoard.

act(obs, deterministic=False)[source]

Returns the action from the current state given the current policy.

Parameters:
  • obs (torch.Tensor) – The current observation (state).

  • deterministic (bool, optional) – Whether we want to use a deterministic policy (used at test time). When true the mean action of the stochastic policy is returned. If False the action is sampled from the stochastic policy. Defaults to False.

Returns:

The action from the current state given the current policy.

Return type:

numpy.ndarray

class stable_learning_control.algos.pytorch.policies.LyapunovActorTwinCritic(observation_space, action_space, hidden_sizes=HIDDEN_SIZES_DEFAULT, activation=ACTIVATION_DEFAULT, output_activation=OUTPUT_ACTIVATION_DEFAULT)[source]

Bases: torch.nn.Module

Lyapunov (soft) Actor-(twin Critic) network.

self.pi

The squashed gaussian policy network (actor).

Type:

SquashedGaussianActor

self.L

The first soft L-network (critic).

Type:

LCritic

self.L2

The second soft L-network (critic).

Type:

LCritic

Initialise the LyapunovActorTwinCritic object.

Parameters:
  • observation_space (gym.space.box.Box) – A gymnasium observation space.

  • action_space (gym.space.box.Box) – A gymnasium action space.

  • hidden_sizes (Union[dict, tuple, list], optional) – Sizes of the hidden layers for the actor. Defaults to (256, 256).

  • activation (Union[dict, torch.nn.modules.activation], optional) – The (actor and critic) hidden layers activation function. Defaults to torch.nn.ReLU.

  • output_activation (Union[dict, torch.nn.modules.activation], optional) – The (actor and critic) output activation function. Defaults to torch.nn.ReLU for the actor and nn.Identity for the critic.

Note

It is currently not possible to set the critic output activation function when using the LyapunovActorTwinCritic. This is since it by design requires the critic output activation to by of type torch.square().

obs_dim
act_dim
act_limits
pi
L
L2
forward(obs, act, deterministic=False, with_logprob=True)[source]

Performs a forward pass through all the networks (Actor and both L critics).

Parameters:
  • obs (torch.Tensor) – The tensor of observations.

  • act (torch.Tensor) – The tensor of actions.

  • deterministic (bool, optional) – Whether we want to use a deterministic policy (used at test time). When true the mean action of the stochastic policy is returned. If false the action is sampled from the stochastic policy. Defaults to False.

  • with_logprob (bool, optional) – Whether we want to return the log probability of an action. Defaults to True.

Returns:

tuple containing:

Return type:

(tuple)

Note

Useful for when you want to print out the full network graph using TensorBoard.

act(obs, deterministic=False)[source]

Returns the action from the current state given the current policy.

Parameters:
  • obs (torch.Tensor) – The current observation (state).

  • deterministic (bool, optional) – Whether we want to use a deterministic policy (used at test time). When true the mean action of the stochastic policy is returned. If False the action is sampled from the stochastic policy. Defaults to False.

Returns:

The action from the current state given the current policy.

Return type:

numpy.ndarray

class stable_learning_control.algos.pytorch.policies.SoftActorCritic(observation_space, action_space, hidden_sizes=HIDDEN_SIZES_DEFAULT, activation=ACTIVATION_DEFAULT, output_activation=OUTPUT_ACTIVATION_DEFAULT)[source]

Bases: torch.nn.Module

Soft Actor-Critic network.

self.pi

The squashed gaussian policy network (actor).

Type:

SquashedGaussianActor

self.Q1

The first soft Q-network (critic).

Type:

QCritic

self.Q1

The second soft Q-network (critic).

Type:

QCritic

Initialise the SoftActorCritic object.

Parameters:
  • observation_space (gym.space.box.Box) – A gymnasium observation space.

  • action_space (gym.space.box.Box) – A gymnasium action space.

  • hidden_sizes (Union[dict, tuple, list], optional) – Sizes of the hidden layers for the actor. Defaults to (256, 256).

  • activation (Union[dict, torch.nn.modules.activation], optional) – The (actor and critic) hidden layers activation function. Defaults to torch.nn.ReLU.

  • output_activation (Union[dict, torch.nn.modules.activation], optional) – The (actor and critic) output activation function. Defaults to torch.nn.ReLU for the actor and nn.Identity for the critic.

obs_dim
act_dim
act_limits
pi
Q1
Q2
forward(obs, act, deterministic=False, with_logprob=True)[source]

Performs a forward pass through all the networks (Actor, Q critic 1 and Q critic 2).

Parameters:
  • obs (torch.Tensor) – The tensor of observations.

  • act (torch.Tensor) – The tensor of actions.

  • deterministic (bool, optional) – Whether we want to use a deterministic policy (used at test time). When true the mean action of the stochastic policy is returned. If false the action is sampled from the stochastic policy. Defaults to False.

  • with_logprob (bool, optional) – Whether we want to return the log probability of an action. Defaults to True.

Returns:

tuple containing:

Return type:

(tuple)

Note

Useful for when you want to print out the full network graph using TensorBoard.

act(obs, deterministic=False)[source]

Returns the action from the current state given the current policy.

Parameters:
  • obs (torch.Tensor) – The current observation (state).

  • deterministic (bool, optional) – Whether we want to use a deterministic policy (used at test time). When true the mean action of the stochastic policy is returned. If False the action is sampled from the stochastic policy. Defaults to False.

Returns:

The action from the current state given the current policy.

Return type:

numpy.ndarray