stable_learning_control.algos.pytorch.policies.actors
Actor network structures.
Submodules
Classes
The squashed gaussian actor network. |
Package Contents
- class stable_learning_control.algos.pytorch.policies.actors.SquashedGaussianActor(obs_dim, act_dim, hidden_sizes, activation=nn.ReLU, output_activation=nn.ReLU, act_limits=None, log_std_min=-20, log_std_max=2.0)[source]
Bases:
torch.nn.Module
The squashed gaussian actor network.
- net
The input/hidden layers of the network.
- Type:
- mu
The output layer which returns the mean of the actions.
- Type:
- log_std_layer
The output layer which returns the log standard deviation of the actions.
- Type:
- act_limits
The
high
andlow
action bounds of the environment. Used for rescaling the actions that comes out of network from(-1, 1)
to(low, high)
. No scaling will be applied if left empty.- Type:
dict, optional
Initialise the SquashedGaussianActor object.
- Parameters:
obs_dim (int) – Dimension of the observation space.
act_dim (int) – Dimension of the action space.
hidden_sizes (list) – Sizes of the hidden layers.
activation (
torch.nn.modules.activation
) – The activation function. Defaults totorch.nn.ReLU
.output_activation (
torch.nn.modules.activation
, optional) – The activation function used for the output layers. Defaults totorch.nn.ReLU
.act_limits (dict) – The
high
andlow
action bounds of the environment. Used for rescaling the actions that comes out of network from(-1, 1)
to(low, high)
.log_std_min (int, optional) – The minimum log standard deviation. Defaults to
-20
.log_std_max (float, optional) – The maximum log standard deviation. Defaults to
2.0
.
- __device_warning_logged = False
- act_limits
- _log_std_min
- _log_std_max
- net
- mu_layer
- log_std_layer
- forward(obs, deterministic=False, with_logprob=True)[source]
Perform forward pass through the network.
- Parameters:
obs (torch.Tensor) – The tensor of observations.
deterministic (bool, optional) – Whether we want to use a deterministic policy (used at test time). When true the mean action of the stochastic policy is returned. If false the action is sampled from the stochastic policy. Defaults to
False
.with_logprob (bool, optional) – Whether we want to return the log probability of an action. Defaults to
True
.
- Returns:
tuple containing:
pi_action (
torch.Tensor
): The actions given by the policy.logp_pi (
torch.Tensor
): The log probabilities of each of these actions.
- Return type:
(tuple)
- act(obs, deterministic=False)[source]
Returns the action from the current state given the current policy.
- Parameters:
obs (torch.Tensor) – The current observation (state).
deterministic (bool, optional) – Whether we want to use a deterministic policy (used at test time). When true the mean action of the stochastic policy is returned. If
False
the action is sampled from the stochastic policy. Defaults toFalse
.
- Returns:
The action from the current state given the current policy.
- Return type:
- get_action(obs, deterministic=False)[source]
Simple warpper for making the
act()
method available under the ‘get_action’ alias.- Parameters:
obs (torch.Tensor) – The current observation (state).
deterministic (bool, optional) – Whether we want to use a deterministic policy (used at test time). When true the mean action of the stochastic policy is returned. If
False
the action is sampled from the stochastic policy. Defaults toFalse
.
- Returns:
- The action from the current state given the current
policy.
- Return type: