stable_gym.envs.robotics.fetch
Stable Gym gymnasium environments that are based on the Fetch environments in the Gymnasium Robotics package.
Note
These environments are based on the gym.GoalEnv
class. This means
that the step
method returns a dictionary with the following keys:
observation
: The observation of the environment.achieved_goal
: The goal that was achieved during execution.desired_goal
: The desired goal that we asked the agent to attempt to achieve.
If you want to use these environments with RL algorithms that expect the step
method to return a np.ndarray
instead of a dictionary, you can use the
gym.wrappers.FlattenObservation
wrapper to flatten the dictionary into a
single np.ndarray
.
Subpackages
Classes
Custom FetchReach gymnasium robotics environment. |
Package Contents
- class stable_gym.envs.robotics.fetch.FetchReachCost(action_space_dtype=np.float32, observation_space_dtype=np.float64, **kwargs)[source]
Bases:
gymnasium_robotics.envs.fetch.reach.MujocoFetchReachEnv
,gymnasium.utils.EzPickle
Custom FetchReach gymnasium robotics environment.
Note
Can also be used in a vectorized manner. See the gym.vector documentation.
- Source:
Modified version of the FetchReach Mujoco environment found in the Gymnasium Robotics library. This modification was first described by Han et al. 2020. In this modified version:
The reward was replaced with a cost. This was done by taking the absolute value of the reward.
The rest of the environment is the same as the original FetchReach environment. Below, the modified cost is described. For more information about the environment (e.g. observation space, action space, episode termination, etc.), please refer to the gymnasium robotics library.
- Modified cost:
A cost, computed using the
FetchReachCost.cost()
method, is given for each simulation step, including the terminal step. This cost is defined as the error between FetchReach’s end-effector position and the desired goal position (i.e. Euclidean distance). The cost is computed as:\[cost = \left | reward \right |\]- Solved Requirements:
Considered solved when the average cost is less than or equal to 50 over 100 consecutive trials.
- How to use:
import stable_gyms import gymnasium as gym env = gym.make("stable_gym:FetchReachCost-v1")
- state
The current system state.
- Type:
Attention
Accepts all arguments of the original
MujocoFetchReachEnv
class except for thereward_type
argument. This is because we require dense rewards to calculate the cost.Initialise a new FetchReachCost environment instance.
- Parameters:
action_space_dtype (union[numpy.dtype, str], optional) – The data type of the action space. Defaults to
np.float32
.observation_space_dtype (union[numpy.dtype, str], optional) – The data type of the observation space. Defaults to
np.float64
.**kwargs – Keyword arguments passed to the original
MujocoFetchReachEnv
class.
- state = None
- _action_space_dtype
- _observation_space_dtype
- _action_dtype_conversion_warning = False
- action_space
- step(action)[source]
Take step into the environment.
Note
This method overrides the
step()
method such that the new cost function is used.- Parameters:
action (np.ndarray) – Action to take in the environment.
- Returns:
tuple containing:
obs (
np.ndarray
): Environment observation.cost (
float
): Cost of the action.terminated (
bool
): Whether the episode is terminated.truncated (
bool
): Whether the episode was truncated. This value is set by wrappers when for example a time limit is reached or the agent goes out of bounds.info (
dict
): Additional information about the environment.
- Return type:
(tuple)
- reset(seed=None, options=None)[source]
Reset gymnasium environment.
- Parameters:
- Returns:
tuple containing:
obs (
numpy.ndarray
): Initial environment observation.info (
dict
): Dictionary containing additional information.
- Return type:
(tuple)
- property tau
- Alias for the environment step size. Done for compatibility with the
- other gymnasium environments.
- property t
- Environment time.
- property physics_time
- Returns the physics time.