stable_gym.envs.robotics.fetch.fetch_reach_cost

Modified version of the FetchReach Mujoco environment found in the Gymnasium Robotics library. This modification was first described by Han et al. 2020. In this modified version:

  • The reward was replaced with a cost. This was done by taking the absolute value of the reward.

Submodules

Classes

FetchReachCost

Custom FetchReach gymnasium robotics environment.

Package Contents

class stable_gym.envs.robotics.fetch.fetch_reach_cost.FetchReachCost(action_space_dtype=np.float32, observation_space_dtype=np.float64, **kwargs)[source]

Bases: gymnasium_robotics.envs.fetch.reach.MujocoFetchReachEnv, gymnasium.utils.EzPickle

Custom FetchReach gymnasium robotics environment.

Note

Can also be used in a vectorized manner. See the gym.vector documentation.

Source:

Modified version of the FetchReach Mujoco environment found in the Gymnasium Robotics library. This modification was first described by Han et al. 2020. In this modified version:

  • The reward was replaced with a cost. This was done by taking the absolute value of the reward.

The rest of the environment is the same as the original FetchReach environment. Below, the modified cost is described. For more information about the environment (e.g. observation space, action space, episode termination, etc.), please refer to the gymnasium robotics library.

Modified cost:

A cost, computed using the FetchReachCost.cost() method, is given for each simulation step, including the terminal step. This cost is defined as the error between FetchReach’s end-effector position and the desired goal position (i.e. Euclidean distance). The cost is computed as:

\[cost = \left | reward \right |\]
Solved Requirements:

Considered solved when the average cost is less than or equal to 50 over 100 consecutive trials.

How to use:
import stable_gyms
import gymnasium as gym
env = gym.make("stable_gym:FetchReachCost-v1")
state

The current system state.

Type:

numpy.ndarray

dt

The environment step size. Also available as tau.

Type:

float

Attention

Accepts all arguments of the original MujocoFetchReachEnv class except for the reward_type argument. This is because we require dense rewards to calculate the cost.

Initialise a new FetchReachCost environment instance.

Parameters:
  • action_space_dtype (union[numpy.dtype, str], optional) – The data type of the action space. Defaults to np.float32.

  • observation_space_dtype (union[numpy.dtype, str], optional) – The data type of the observation space. Defaults to np.float64.

  • **kwargs – Keyword arguments passed to the original MujocoFetchReachEnv class.

state = None
_action_space_dtype
_observation_space_dtype
_action_dtype_conversion_warning = False
action_space
cost(reward)[source]

Calculate the cost.

Parameters:

reward (float) – The reward returned from the FetchReach environment.

Returns:

The cost (i.e. negated reward).

Return type:

float

step(action)[source]

Take step into the environment.

Note

This method overrides the step() method such that the new cost function is used.

Parameters:

action (np.ndarray) – Action to take in the environment.

Returns:

tuple containing:

  • obs (np.ndarray): Environment observation.

  • cost (float): Cost of the action.

  • terminated (bool): Whether the episode is terminated.

  • truncated (bool): Whether the episode was truncated. This value is set by wrappers when for example a time limit is reached or the agent goes out of bounds.

  • info (dict): Additional information about the environment.

Return type:

(tuple)

reset(seed=None, options=None)[source]

Reset gymnasium environment.

Parameters:
  • seed (int, optional) – A random seed for the environment. By default None.

  • options (dict, optional) – A dictionary containing additional options for resetting the environment. By default None. Not used in this environment.

Returns:

tuple containing:

  • obs (numpy.ndarray): Initial environment observation.

  • info (dict): Dictionary containing additional information.

Return type:

(tuple)

property tau
Alias for the environment step size. Done for compatibility with the
other gymnasium environments.
property t
Environment time.
property physics_time
Returns the physics time.