MinitaurBulletCost gymnasium environment

Minitaur Bullet Cost environment

An actuated 8-jointed Minitaur. This environment corresponds to the MinitaurBulletEnv-v0 environment included in the pybullet package. It is different in the fact that:

  • The objective was changed to a velocity-tracking task. To do this, the reward is replaced with a cost. This cost is the squared difference between the Minitaur’s forward velocity and a reference value (error). Additionally, also a energy cost and health penalty can be included in the cost.

  • A minimal backward velocity bound is added to prevent the Minitaur from walking backwards.

  • Users are given the option to modify the Minitaur fall criteria, and thus the episode termination criteria.

The rest of the environment is the same as the original Minitaur environment. Please refer to the original codebase or the article of Tan et al. 2018 on which the Minitaur environment is based for more information.

Observation space

The original observation space of the MinitaurBulletEnv-v0 environment contains all eight motors’ angles, velocities and torques. In this modified version, the observation space has been extended to add three additional observations:

  • \(r\): The velocity reference signal that needs to be tracked.

  • \(r_{error}\): The difference between the current and reference velocities.

  • \(v_{x}\): The Minitaur’s forward velocity.

These observations optional and can be excluded from the observation space by setting the exclude_reference_from_observation, exclude_reference_error_from_observation and exclude_x_velocity_from_observation environment arguments to True.

Action space

The action space contains the desired motor angles for all eight motors.

Episode Termination

An episode is terminated when:

  • The episode is terminated if the Minitaur falls, meaning that the the orientation between the base and the world is greater than a threshold or the base is too close to the ground.

  • Optionally, the episode can be terminated if the Minitaur walks backwards.

Environment goal

The Minitaur should walk forward with a certain velocity.

Cost function

The cost function of this environment is designed in such a way that it tries to minimize the error between the Minitaur’s forward velocity and a reference value. The cost function is defined as:

\[\begin{split} cost = w_{forward\_velocity} \times (x_{velocity} - x_{reference\_x\_velocity})^2 + w_{energy} \times c_{energy} + \\ w_{shake} \times c_{shake} + w_{drift} \times c_{drift} + p_{health} \end{split}\]

Where:

  • \(w_{forward\_velocity}\) - is the weight of the forward velocity error.

  • \(x_{velocity}\) - is the HopMintaur’s forward velocity.

  • \(x_{reference\_x\_velocity}\) is the reference forward velocity.

  • \(w_{energy}\) is the weight of the energy cost (optional).

  • \(c_{energy}\) is the control energy cost (optional).

  • \(w_{shake}\) is the weight of the shake cost (optional).

  • \(c_{shake}\) is the shake (z movement) cost (optional).

  • \(w_{drift}\) is the weight of the drift (y movement) cost (optional).

  • \(c_{drift}\) is the drift cost (optional).

  • \(p_{health}\) is a penalty for being unhealthy (i.e. if the Minitaur falls over).

The energy, shake, drift, and health penalty are optional and can be disabled using the include_energy_cost, include_shake_cost, include_drift_cost and include_health_penalty environment arguments.

Environment step return

In addition to the observations, the cost and a termination and truncation boolean, the environment also returns an info dictionary:

[observation, cost, termination, truncation, info_dict]

Compared to the original MinitaurBulletEnv-v0 environment, the following keys were added to this info dictionary:

  • reference: The reference velocity.

  • state_of_interest: The state that should track the reference (SOI).

  • reference_error: The error between SOI and the reference.

How to use

This environment is part of the Stable Gym package. It is therefore registered as the stable_gym:MinitaurBulletCost-v1 gymnasium environment when you import the Stable Gym package. If you want to use the environment in stand-alone mode, you can register it yourself.