MinitaurBulletCost gymnasium environment
An actuated 8-jointed Minitaur. This environment corresponds to the MinitaurBulletEnv-v0 environment included in the pybullet package. It is different in the fact that:
The objective was changed to a velocity-tracking task. To do this, the reward is replaced with a cost. This cost is the squared difference between the Minitaur’s forward velocity and a reference value (error). Additionally, also a energy cost and health penalty can be included in the cost.
A minimal backward velocity bound is added to prevent the Minitaur from walking backwards.
Users are given the option to modify the Minitaur fall criteria, and thus the episode termination criteria.
The rest of the environment is the same as the original Minitaur environment. Please refer to the original codebase or the article of Tan et al. 2018 on which the Minitaur environment is based for more information.
Observation space
The original observation space of the MinitaurBulletEnv-v0 environment contains all eight motors’ angles, velocities and torques. In this modified version, the observation space has been extended to add three additional observations:
\(r\): The velocity reference signal that needs to be tracked.
\(r_{error}\): The difference between the current and reference velocities.
\(v_{x}\): The Minitaur’s forward velocity.
These observations optional and can be excluded from the observation space by setting the exclude_reference_from_observation
, exclude_reference_error_from_observation
and exclude_x_velocity_from_observation
environment arguments to True
.
Action space
The action space contains the desired motor angles for all eight motors.
Episode Termination
An episode is terminated when:
The episode is terminated if the Minitaur falls, meaning that the the orientation between the base and the world is greater than a threshold or the base is too close to the ground.
Optionally, the episode can be terminated if the Minitaur walks backwards.
Environment goal
The Minitaur should walk forward with a certain velocity.
Cost function
The cost function of this environment is designed in such a way that it tries to minimize the error between the Minitaur’s forward velocity and a reference value. The cost function is defined as:
Where:
\(w_{forward\_velocity}\) - is the weight of the forward velocity error.
\(x_{velocity}\) - is the HopMintaur’s forward velocity.
\(x_{reference\_x\_velocity}\) is the reference forward velocity.
\(w_{energy}\) is the weight of the energy cost (optional).
\(c_{energy}\) is the control energy cost (optional).
\(w_{shake}\) is the weight of the shake cost (optional).
\(c_{shake}\) is the shake (z movement) cost (optional).
\(w_{drift}\) is the weight of the drift (y movement) cost (optional).
\(c_{drift}\) is the drift cost (optional).
\(p_{health}\) is a penalty for being unhealthy (i.e. if the Minitaur falls over).
The energy, shake, drift, and health penalty are optional and can be disabled using the include_energy_cost
, include_shake_cost
, include_drift_cost
and include_health_penalty
environment arguments.
Environment step return
In addition to the observations, the cost and a termination and truncation boolean, the environment also returns an info dictionary:
[observation, cost, termination, truncation, info_dict]
Compared to the original MinitaurBulletEnv-v0 environment, the following keys were added to this info dictionary:
reference: The reference velocity.
state_of_interest: The state that should track the reference (SOI).
reference_error: The error between SOI and the reference.
How to use
This environment is part of the Stable Gym package. It is therefore registered as the stable_gym:MinitaurBulletCost-v1
gymnasium environment when you import the Stable Gym package. If you want to use the environment in stand-alone mode, you can register it yourself.