stable_learning_control.algos.pytorch.policies.actors
Actor network structures.
Submodules
Classes
The squashed gaussian actor network. |
Package Contents
- class stable_learning_control.algos.pytorch.policies.actors.SquashedGaussianActor(obs_dim, act_dim, hidden_sizes, activation=nn.ReLU, output_activation=nn.ReLU, act_limits=None, log_std_min=-20, log_std_max=2.0)[source]
Bases:
torch.nn.ModuleThe squashed gaussian actor network.
- net
The input/hidden layers of the network.
- Type:
- mu
The output layer which returns the mean of the actions.
- Type:
- log_std_layer
The output layer which returns the log standard deviation of the actions.
- Type:
- act_limits
The
highandlowaction bounds of the environment. Used for rescaling the actions that comes out of network from(-1, 1)to(low, high). No scaling will be applied if left empty.- Type:
dict, optional
Initialise the SquashedGaussianActor object.
- Parameters:
obs_dim (int) – Dimension of the observation space.
act_dim (int) – Dimension of the action space.
hidden_sizes (list) – Sizes of the hidden layers.
activation (
torch.nn.modules.activation) – The activation function. Defaults totorch.nn.ReLU.output_activation (
torch.nn.modules.activation, optional) – The activation function used for the output layers. Defaults totorch.nn.ReLU.act_limits (dict) – The
highandlowaction bounds of the environment. Used for rescaling the actions that comes out of network from(-1, 1)to(low, high).log_std_min (int, optional) – The minimum log standard deviation. Defaults to
-20.log_std_max (float, optional) – The maximum log standard deviation. Defaults to
2.0.
- __device_warning_logged = False
- act_limits
- _log_std_min
- _log_std_max
- net
- mu_layer
- log_std_layer
- forward(obs, deterministic=False, with_logprob=True)[source]
Perform forward pass through the network.
- Parameters:
obs (torch.Tensor) – The tensor of observations.
deterministic (bool, optional) – Whether we want to use a deterministic policy (used at test time). When true the mean action of the stochastic policy is returned. If false the action is sampled from the stochastic policy. Defaults to
False.with_logprob (bool, optional) – Whether we want to return the log probability of an action. Defaults to
True.
- Returns:
tuple containing:
pi_action (
torch.Tensor): The actions given by the policy.logp_pi (
torch.Tensor): The log probabilities of each of these actions.
- Return type:
(tuple)
- act(obs, deterministic=False)[source]
Returns the action from the current state given the current policy.
- Parameters:
obs (torch.Tensor) – The current observation (state).
deterministic (bool, optional) – Whether we want to use a deterministic policy (used at test time). When true the mean action of the stochastic policy is returned. If
Falsethe action is sampled from the stochastic policy. Defaults toFalse.
- Returns:
The action from the current state given the current policy.
- Return type:
- get_action(obs, deterministic=False)[source]
Simple warpper for making the
act()method available under the ‘get_action’ alias.- Parameters:
obs (torch.Tensor) – The current observation (state).
deterministic (bool, optional) – Whether we want to use a deterministic policy (used at test time). When true the mean action of the stochastic policy is returned. If
Falsethe action is sampled from the stochastic policy. Defaults toFalse.
- Returns:
- The action from the current state given the current
policy.
- Return type: