stable_learning_control.algos.pytorch.policies.lyapunov_actor_critic
Lyapunov (soft) actor critic policy.
This module contains a Pytorch implementation of the Lyapunov Actor Critic policy of Han et al. 2020.
Attributes
Classes
Lyapunov (soft) Actor-Critic network. |
Module Contents
- stable_learning_control.algos.pytorch.policies.lyapunov_actor_critic.OUTPUT_ACTIVATION_DEFAULT[source]
- class stable_learning_control.algos.pytorch.policies.lyapunov_actor_critic.LyapunovActorCritic(observation_space, action_space, hidden_sizes=HIDDEN_SIZES_DEFAULT, activation=ACTIVATION_DEFAULT, output_activation=OUTPUT_ACTIVATION_DEFAULT)[source]
Bases:
torch.nn.Module
Lyapunov (soft) Actor-Critic network.
- self.pi
The squashed gaussian policy network (actor).
- Type:
Initialise the LyapunovActorCritic object.
- Parameters:
observation_space (
gym.space.box.Box
) – A gymnasium observation space.action_space (
gym.space.box.Box
) – A gymnasium action space.hidden_sizes (Union[dict, tuple, list], optional) – Sizes of the hidden layers for the actor. Defaults to
(256, 256)
.activation (Union[
dict
,torch.nn.modules.activation
], optional) – The (actor and critic) hidden layers activation function. Defaults totorch.nn.ReLU
.output_activation (Union[
dict
,torch.nn.modules.activation
], optional) – The (actor and critic) output activation function. Defaults totorch.nn.ReLU
for the actor and nn.Identity for the critic.
Note
It is currently not possible to set the critic output activation function when using the LyapunovActorCritic. This is since it by design requires the critic output activation to by of type
torch.square()
.- forward(obs, act, deterministic=False, with_logprob=True)[source]
Performs a forward pass through all the networks (Actor and L critic).
- Parameters:
obs (torch.Tensor) – The tensor of observations.
act (torch.Tensor) – The tensor of actions.
deterministic (bool, optional) – Whether we want to use a deterministic policy (used at test time). When true the mean action of the stochastic policy is returned. If false the action is sampled from the stochastic policy. Defaults to
False
.with_logprob (bool, optional) – Whether we want to return the log probability of an action. Defaults to
True
.
- Returns:
tuple containing:
pi_action (
torch.Tensor
): The actions given by the policy.logp_pi (
torch.Tensor
): The log probabilities of each of these actions.L (
torch.Tensor
): Critic L values.
- Return type:
(tuple)
Note
Useful for when you want to print out the full network graph using TensorBoard.
- act(obs, deterministic=False)[source]
Returns the action from the current state given the current policy.
- Parameters:
obs (torch.Tensor) – The current observation (state).
deterministic (bool, optional) – Whether we want to use a deterministic policy (used at test time). When true the mean action of the stochastic policy is returned. If
False
the action is sampled from the stochastic policy. Defaults toFalse
.
- Returns:
The action from the current state given the current policy.
- Return type: