stable_gym.envs.mujoco.walker2d_cost
Modified version of the Walker2d Mujoco environment found in the gymnasium library. This modification was first described by Han et al. 2020. In this modified version:
The objective was changed to a velocity-tracking task. To do this, the reward is replaced with a cost. This cost is the squared difference between the Walker2d’s forward velocity and a reference value (error).
The reference velocity was added to the observation space.
Three optional variables were added to the observation space; The reference velocity, the reference error (i.e. the difference between the walker2d’s forward velocity and the reference) and the walker2d’s forward velocity. These variables can be enabled using the
exclude_reference_from_observation
,exclude_reference_error_from_observation
andexclude_velocity_from_observation
environment arguments.
Submodules
Classes
Custom Walker2d gymnasium environment. |
Package Contents
- class stable_gym.envs.mujoco.walker2d_cost.Walker2dCost(reference_forward_velocity=1.0, randomise_reference_forward_velocity=False, randomise_reference_forward_velocity_range=(0.5, 1.5), forward_velocity_weight=1.0, include_ctrl_cost=False, include_health_penalty=True, health_penalty_size=10, ctrl_cost_weight=0.001, terminate_when_unhealthy=True, healthy_z_range=(0.8, 2.0), healthy_angle_range=(-1.0, 1.0), reset_noise_scale=0.005, exclude_current_positions_from_observation=True, exclude_reference_from_observation=False, exclude_reference_error_from_observation=True, exclude_x_velocity_from_observation=False, action_space_dtype=np.float32, observation_space_dtype=np.float64, **kwargs)[source]
Bases:
gymnasium.envs.mujoco.walker2d_v4.Walker2dEnv
,gymnasium.utils.EzPickle
Custom Walker2d gymnasium environment.
Note
Can also be used in a vectorized manner. See the gym.vector documentation.
- Source:
This is a modified version of the Walker2d Mujoco environment found in the gymnasium library. This modification was first described by Han et al. 2020. Compared to the original Walker2d environment in this modified version:
The objective was changed to a velocity-tracking task. To do this, the reward is replaced with a cost. This cost is the squared difference between the Walker2d’s forward velocity and a reference value (error). Additionally, also a control cost and health penalty can be included in the cost.
Three optional variables were added to the observation space; The reference velocity, the reference error (i.e. the difference between the walker2d’s forward velocity and the reference) and the walker2d’s forward velocity. These variables can be enabled using the
exclude_reference_from_observation
,exclude_reference_error_from_observation
andexclude_velocity_from_observation
environment arguments.
The rest of the environment is the same as the original Walker2d environment. Below, the modified cost is described. For more information about the environment (e.g. observation space, action space, episode termination, etc.), please refer to the gymnasium library.
- Modified cost:
A cost, computed using the
Walker2dCost.cost()
method, is given for each simulation step, including the terminal step. This cost is defined as the error between the Walkers’s forward velocity and a reference value. A control cost and health penalty can also be included in the cost. The cost is computed as:\[cost = w_{forward\_velocity} \times (x_{velocity} - x_{reference\_x\_velocity})^2 + w_{ctrl} \times c_{ctrl} + p_{health}\]- Solved Requirements:
Considered solved when the average cost is less than or equal to 50 over 100 consecutive trials.
- How to use:
import stable_gyms import gymnasium as gym env = gym.make("stable_gym:Walker2dCost-v1")
- state
The current system state.
- Type:
Initialise a new Walker2dCost environment instance.
- Parameters:
reference_forward_velocity (float, optional) – The forward velocity that the agent should try to track. Defaults to
1.0
.randomise_reference_forward_velocity (bool, optional) – Whether to randomize the reference forward velocity. Defaults to
False
.randomise_reference_forward_velocity_range (tuple, optional) – The range of the random reference forward velocity. Defaults to
(0.5, 1.5)
.forward_velocity_weight (float, optional) – The weight used to scale the forward velocity error. Defaults to
1.0
.include_ctrl_cost (bool, optional) – Whether you also want to penalize the 2D walker if it takes actions that are too large. Defaults to
False
.include_health_penalty (bool, optional) – Whether to penalize the 2D walker if it becomes unhealthy (i.e. if it falls over). Defaults to
True
.health_penalty_size (int, optional) – The size of the unhealthy penalty. Defaults to
10
.ctrl_cost_weight (float, optional) – The weight used to scale the control cost. Defaults to
1e-3
.terminate_when_unhealthy (bool, optional) – Whether to terminate the episode when the 2D walker becomes unhealthy. Defaults to
True
.healthy_z_range (tuple, optional) – The range of healthy z values. Defaults to
(0.8, 2.0)
.healthy_angle_range (tuple, optional) – The range of healthy angles. Defaults to
(-1.0, 1.0)
,reset_noise_scale (float, optional) – Scale of random perturbations of the initial position and velocity. Defaults to
5e-3
.exclude_current_positions_from_observation (bool, optional) – Whether to omit the x- and y-coordinates of the front tip from observations. Excluding the position can serve as an inductive bias to induce position-agnostic behaviour in policies. Defaults to
True
.exclude_reference_from_observation (bool, optional) – Whether the reference should be excluded from the observation. Defaults to
False
.exclude_reference_error_from_observation (bool, optional) – Whether the error should be excluded from the observation. Defaults to
True
.exclude_x_velocity_from_observation (bool, optional) – Whether to omit the x- component of the velocity from observations. Defaults to
False
.action_space_dtype (union[numpy.dtype, str], optional) – The data type of the action space. Defaults to
np.float32
.observation_space_dtype (union[numpy.dtype, str], optional) – The data type of the observation space. Defaults to
np.float64
.**kwargs – Extra keyword arguments to pass to the
Walker2dEnv
class.
- reference_forward_velocity
- _randomise_reference_forward_velocity
- _randomise_reference_forward_velocity_range
- _forward_velocity_weight
- _include_ctrl_cost
- _include_health_penalty
- _health_penalty_size
- _exclude_reference_from_observation
- _exclude_reference_error_from_observation
- _exclude_x_velocity_from_observation
- _action_space_dtype
- _observation_space_dtype
- _action_dtype_conversion_warning = False
- state = None
- low
- high
- observation_space
- step(action)[source]
Take step into the environment.
Note
This method overrides the
step()
method such that the new cost function is used.- Parameters:
action (np.ndarray) – Action to take in the environment.
- Returns:
tuple containing:
obs (
np.ndarray
): Environment observation.cost (
float
): Cost of the action.terminated (
bool
): Whether the episode is terminated.truncated (
bool
): Whether the episode was truncated. This value is set by wrappers when for example a time limit is reached or the agent goes out of bounds.info (
dict
): Additional information about the environment.
- Return type:
(tuple)
- reset(seed=None, options=None)[source]
Reset gymnasium environment.
- Parameters:
- Returns:
tuple containing:
obs (
numpy.ndarray
): Initial environment observation.info (
dict
): Dictionary containing additional information.
- Return type:
(tuple)
- property tau
- Alias for the environment step size. Done for compatibility with the
- other gymnasium environments.
- property t
- Environment time.
- property physics_time
- Returns the physics time.