stable_gym.envs.robotics.fetch

Stable Gym gymnasium environments that are based on the Fetch environments in the Gymnasium Robotics package.

Note

These environments are based on the gym.GoalEnv class. This means that the step method returns a dictionary with the following keys:

observation: The observation of the environment.
achieved_goal: The goal that was achieved during execution.
desired_goal: The desired goal that we asked the agent to attempt to achieve.

If you want to use these environments with RL algorithms that expect the step method to return a np.ndarray instead of a dictionary, you can use the gym.wrappers.FlattenObservation wrapper to flatten the dictionary into a single np.ndarray.

Subpackages

stable_gym.envs.robotics.fetch.fetch_reach_cost

Classes

FetchReachCost

Custom FetchReach gymnasium robotics environment.

Package Contents

class stable_gym.envs.robotics.fetch.FetchReachCost(action_space_dtype=np.float32, observation_space_dtype=np.float64, **kwargs)[source]

Bases: gymnasium_robotics.envs.fetch.reach.MujocoFetchReachEnv, gymnasium.utils.EzPickle

Custom FetchReach gymnasium robotics environment.

Note

Can also be used in a vectorized manner. See the gym.vector documentation.

Source:

Modified version of the FetchReach Mujoco environment found in the Gymnasium Robotics library. This modification was first described by Han et al. 2020. In this modified version:

The reward was replaced with a cost. This was done by taking the absolute value of the reward.

The rest of the environment is the same as the original FetchReach environment. Below, the modified cost is described. For more information about the environment (e.g. observation space, action space, episode termination, etc.), please refer to the gymnasium robotics library.

Modified cost:

A cost, computed using the FetchReachCost.cost() method, is given for each simulation step, including the terminal step. This cost is defined as the error between FetchReach’s end-effector position and the desired goal position (i.e. Euclidean distance). The cost is computed as:

\[cost = \left | reward \right |\]

Solved Requirements:

Considered solved when the average cost is less than or equal to 50 over 100 consecutive trials.

How to use:

import stable_gyms
import gymnasium as gym
env = gym.make("stable_gym:FetchReachCost-v1")

state

The current system state.

Type:: numpy.ndarray

dt

The environment step size. Also available as tau.

Type:: float

Attention

Accepts all arguments of the original MujocoFetchReachEnv class except for the reward_type argument. This is because we require dense rewards to calculate the cost.

Initialise a new FetchReachCost environment instance.

Parameters:

action_space_dtype (union[numpy.dtype, str], optional) – The data type of the action space. Defaults to np.float32.
observation_space_dtype (union[numpy.dtype, str], optional) – The data type of the observation space. Defaults to np.float64.
**kwargs – Keyword arguments passed to the original MujocoFetchReachEnv class.

state = None

_action_space_dtype

_observation_space_dtype

_action_dtype_conversion_warning = False

action_space

cost(reward)[source]

Calculate the cost.

Parameters:: reward (float) – The reward returned from the FetchReach environment.
Returns:: The cost (i.e. negated reward).
Return type:: float

step(action)[source]

Take step into the environment.

Note

This method overrides the step() method such that the new cost function is used.

Parameters:

action (np.ndarray) – Action to take in the environment.

Returns:

tuple containing:

obs (np.ndarray): Environment observation.

cost (float): Cost of the action.

terminated (bool): Whether the episode is terminated.

truncated (bool): Whether the episode was truncated. This value is set by wrappers when for example a time limit is reached or the agent goes out of bounds.

info (dict): Additional information about the environment.

Return type:

(tuple)

reset(seed=None, options=None)[source]

Reset gymnasium environment.

Parameters:

seed (int, optional) – A random seed for the environment. By default None.
options (dict, optional) – A dictionary containing additional options for resetting the environment. By default None. Not used in this environment.

Returns:

tuple containing:

obs (numpy.ndarray): Initial environment observation.

info (dict): Dictionary containing additional information.

Return type:

(tuple)

property tau
Alias for the environment step size. Done for compatibility with the
other gymnasium environments.

property t
Environment time.

property physics_time
Returns the physics time.