stable_gym.envs.robotics.minitaur.minitaur_bullet_cost
This package contains a modified version of the MinitaurBullet environment found in the pybullet package. This environment first appeared in a paper by Tan et al. 2018. The version found here is based on the modification given by Han et al. 2020. In this modified version:
The objective was changed to a velocity-tracking task. To do this, the reward is replaced with a cost. This cost is the squared difference between the Minitaur’s forward velocity and a reference value (error).
A minimal backward velocity bound is added to prevent the Minitaur from walking backwards.
Users are given the option to modify the Minitaur fall criteria and thus the episode termination criteria.
Submodules
Classes
Custom Minitaur Bullet gymnasium environment. |
Package Contents
- class stable_gym.envs.robotics.minitaur.minitaur_bullet_cost.MinitaurBulletCost(reference_forward_velocity=1.0, randomise_reference_forward_velocity=False, randomise_reference_forward_velocity_range=(0.5, 1.5), forward_velocity_weight=1.0, include_energy_cost=False, energy_weight=0.005, include_shake_cost=False, shake_weight=0.01, include_drift_cost=False, drift_weight=0.01, distance_limit=float('inf'), render=False, include_health_penalty=True, health_penalty_size=None, backward_velocity_bound=-0.5, fall_criteria_up_rotation=0.85, fall_criteria_z_position=0.13, exclude_reference_from_observation=False, exclude_reference_error_from_observation=True, exclude_x_velocity_from_observation=False, action_space_dtype=np.float32, observation_space_dtype=np.float64, **kwargs)[source]
Bases:
pybullet_envs.bullet.minitaur_gym_env.MinitaurBulletEnv
,gymnasium.utils.EzPickle
Custom Minitaur Bullet gymnasium environment.
Note
Can also be used in a vectorized manner. See the gym.vector documentation.
- Source:
Modified version of the Minitaur environment found in the pybullet package. This modification was first described by Han et al. 2020. In this modified version:
The objective was changed to a velocity-tracking task. To do this, the reward is replaced with a cost. This cost is the squared difference between the Minitaur’s forward velocity and a reference value (error). Additionally, also a energy cost and health penalty can be included in the cost.
A minimal backward velocity bound is added to prevent the Minitaur from walking backwards.
Users are given the option to modify the Minitaur fall criteria, and thus the episode termination criteria.
The rest of the environment is the same as the original Minitaur environment. Please refer to the original codebase or the article of Tan et al. 2018 on which the Minitaur environment is based for more information.
Important
In Han et al. 2020, the authors disabled the termination criteria. In our implementation, we have kept them for consistency with the original Minitaur environment. The termination criteria can be enabled by setting the :arg:`fall_criteria_up_rotation` and :arg:`fall_criteria_z_position` to
np.inf
.
- Observation:
Type: Box(28)
Contains angles, velocities, and torques of all motors. Optionally, it can also include the reference, reference error, and x velocity.
- Actions:
Type: Box(8)
A list of desired motor angles for eight motors.
- Modified cost:
A cost, computed using the
MinitaurBulletCost.cost()
method, is given for each simulation step, including the terminal step. This cost is defined as the error between the Minitaur’s forward velocity and a reference value. A control cost and health penalty can also be included in the cost. This health penalty equals themax_episode_steps
minus the number of steps taken in the episode or a fixed value. The cost is computed as:\[cost = w_{forward\_velocity} \times (x_{velocity} - x_{reference\_x\_velocity})^2 + w_{ctrl} \times c_{ctrl} + p_{health}\]- Starting State:
The robot always starts at the same position and orientation, with zero velocity.
- Episode Termination:
The episode is terminated if the Minitaur falls, meaning that the the orientation between the base and the world is greater than a threshold or the base is too close to the ground.
Optionally, the episode can be terminated if the Minitaur walks backwards.
- Solved Requirements:
Considered solved when the average cost is less than or equal to 50 over 100 consecutive trials.
- How to use:
import stable_gym import gymnasium as gym env = gym.make("stable_gym:MinitaurBulletCost-v1")
- state
The current system state.
- Type:
Attention
Since the
MinitaurBulletEnv()
is not yet compatible with gymnasium v>=0.26.0, thegym.wrappers.EnvCompatibility
wrapper is used. This has the side effect that therender_mode
argument is not working. Instead, therender
argument should be used.Initialise a new MinitaurBulletCost environment instance.
- Parameters:
reference_forward_velocity (float, optional) – The forward velocity that the agent should try to track. Defaults to
1.0
.randomise_reference_forward_velocity (bool, optional) – Whether to randomize the reference forward velocity. Defaults to
False
.randomise_reference_forward_velocity_range (tuple, optional) – The range of the random reference forward velocity. Defaults to
(0.5, 1.5)
.forward_velocity_weight (float, optional) – The weight used to scale the forward velocity error. Defaults to
1.0
.include_energy_cost (bool, optional) – Whether to include the energy cost in the cost function (i.e. energy of the motors). Defaults to
False
.energy_weight (float, optional) – The weight used to scale the energy cost. Defaults to
0.005
.include_shake_cost (bool, optional) – Whether to include the shake cost in the cost function (i.e. moving up and down). Defaults to
False
.shake_weight (float, optional) – The weight used to scale the shake cost. Defaults to
0.01
.include_drift_cost (bool, optional) – Whether to include the drift cost in the cost function (i.e. movement in the y direction). Defaults to
False
.drift_weight (float, optional) – The weight used to scale the drift cost. Defaults to
0.01
.distance_limit (float, optional) – The max distance (in meters) that the agent can travel before the episode is terminated. Defaults to
float("inf")
.render (bool, optional) – Whether to render the environment. Defaults to
False
.include_health_penalty (bool, optional) – Whether to penalize the Minitaur if it becomes unhealthy (i.e. if it falls over). Defaults to
True
.health_penalty_size (int, optional) – The size of the unhealthy penalty. Defaults to
None
. Meaning the penalty is equal to the max episode steps and the steps taken.backward_velocity_bound (float) – The max backward velocity (in meters per second) before the episode is terminated. Defaults to
-0.5
.fall_criteria_up_rotation (float) – The max up rotation (in radians) between the base and the world before the episode is terminated. Defaults to
0.85
.fall_criteria_z_position (float) – The max z position (in meters) before the episode is terminated. Defaults to
0.13
.exclude_reference_from_observation (bool, optional) – Whether the reference should be excluded from the observation. Defaults to
False
.exclude_reference_error_from_observation (bool, optional) – Whether the error should be excluded from the observation. Defaults to
True
.exclude_x_velocity_from_observation (bool, optional) – Whether to omit the x- component of the velocity from observations. Defaults to
False
.action_space_dtype (union[numpy.dtype, str], optional) – The data type of the action space. Defaults to
np.float32
.observation_space_dtype (union[numpy.dtype, str], optional) – The data type of the observation space. Defaults to
np.float64
.**kwargs – Extra keyword arguments to pass to the
MinitaurBulletEnv
class.
- metadata
- state = None
- t = 0.0
- reference_forward_velocity
- _randomise_reference_forward_velocity
- _randomise_reference_forward_velocity_range
- _forward_velocity_weight
- _include_energy_cost
- _energy_weight
- _include_shake_cost
- _shake_weight
- _include_drift_cost
- _drift_weight
- _include_health_penalty
- _health_penalty_size
- _backward_velocity_bound
- _fall_criteria_up_rotation
- _fall_criteria_z_position
- _exclude_reference_from_observation
- _exclude_reference_error_from_observation
- _exclude_x_velocity_from_observation
- _action_space_dtype
- _observation_space_dtype
- _action_dtype_conversion_warning = False
- observation_space
- action_space
- low
- high
- cost(x_velocity, energy_cost, drift_cost, shake_cost)[source]
Compute the cost of a given base x velocity, energy cost, shake cost and drift cost.
- step(action)[source]
Take step into the environment.
Note
This method overrides the
step()
method such that the new cost function is used.- Parameters:
action (np.ndarray) – Action to take in the environment.
render_mode (str, optional) – The render mode to use. Defaults to
None
.
- Returns:
tuple containing:
obs (
np.ndarray
): Environment observation.cost (
float
): Cost of the action.terminated (
bool
): Whether the episode is terminated.truncated (
bool
): Whether the episode was truncated. This value is set by wrappers when for example a time limit is reached or the agent goes out of bounds.info (
dict
): Additional information about the environment.
- Return type:
(tuple)
- reset()[source]
Reset gymnasium environment.
- Returns:
Initial environment observation.
- Return type:
(np.ndarray)
- _termination()[source]
Check whether the episode is terminated.
Note
This method overrides the
_termination()
method of the original Minitaur environment so that we can also set a minimum velocity criteria.- Returns:
Boolean value that indicates whether the episode is terminated.
- Return type:
(bool)
- is_fallen()[source]
Check whether the minitaur has fallen.
If the up directions (i.e. angle) between the base and the world are larger (the dot product is smaller than
_fall_criteria_up_rotation
) or the base is close to the ground (the height is smaller than_fall_criteria_z_position
), the minitaur is considered fallen.Note
This method overrides the
is_fallen()
method of the original Minitaur environment to give users the ability to set the fall criteria.- Returns:
Boolean value that indicates whether the minitaur has fallen.
- Return type:
(bool)
- property time_limit_max_episode_steps
- The maximum number of steps that the environment can take before it is
- truncated by the :class:`gymnasium.wrappers.TimeLimit` wrapper.
- property base_velocity
- The base velocity of the minitaur.
- property dt
- The environment step size.
- property tau
- Alias for the environment step size. Done for compatibility with the
- other gymnasium environments.
- property physics_time
- Returns the physics time.
Note
The Minitaur uses 100 steps to setup the system. This is why we add 100 time steps.