stable_gym.envs.robotics
Stable Gym gymnasium environments that are based on robotics environments.
Note
Some of these environments are based on the gym.GoalEnv class. This means
that the step method returns a dictionary with the following keys:
- observation: The observation of the environment.
- achieved_goal: The goal that was achieved during execution.
- desired_goal: The desired goal that we asked the agent to attempt to achieve.
If you want to use these environments with RL algorithms that expect the step
method to return a np.ndarray instead of a dictionary, you can use the
gym.wrappers.FlattenObservation wrapper to flatten the dictionary into a
single np.ndarray.
Subpackages
Classes
| Custom FetchReach gymnasium robotics environment. | |
| Custom Minitaur Bullet gymnasium environment. | |
| Custom QuadXHover Bullet gymnasium environment. | |
| Custom QuadX Bullet gymnasium environment. | |
| Custom QuadXWaypoints Bullet gymnasium environment. | 
Package Contents
- class stable_gym.envs.robotics.FetchReachCost(action_space_dtype=np.float32, observation_space_dtype=np.float64, **kwargs)[source]
- Bases: - gymnasium_robotics.envs.fetch.reach.MujocoFetchReachEnv,- gymnasium.utils.EzPickle- Custom FetchReach gymnasium robotics environment. - Note - Can also be used in a vectorized manner. See the gym.vector documentation. - Source:
- Modified version of the FetchReach Mujoco environment found in the Gymnasium Robotics library. This modification was first described by Han et al. 2020. In this modified version: - The reward was replaced with a cost. This was done by taking the absolute value of the reward. 
 - The rest of the environment is the same as the original FetchReach environment. Below, the modified cost is described. For more information about the environment (e.g. observation space, action space, episode termination, etc.), please refer to the gymnasium robotics library. 
- Modified cost:
- A cost, computed using the - FetchReachCost.cost()method, is given for each simulation step, including the terminal step. This cost is defined as the error between FetchReach’s end-effector position and the desired goal position (i.e. Euclidean distance). The cost is computed as:\[cost = \left | reward \right |\]
- Solved Requirements:
- Considered solved when the average cost is less than or equal to 50 over 100 consecutive trials. 
- How to use:
- import stable_gyms import gymnasium as gym env = gym.make("stable_gym:FetchReachCost-v1") 
 - state
- The current system state. - Type:
 
 - Attention - Accepts all arguments of the original - MujocoFetchReachEnvclass except for the- reward_typeargument. This is because we require dense rewards to calculate the cost.- Initialise a new FetchReachCost environment instance. - Parameters:
- action_space_dtype (union[numpy.dtype, str], optional) – The data type of the action space. Defaults to - np.float32.
- observation_space_dtype (union[numpy.dtype, str], optional) – The data type of the observation space. Defaults to - np.float64.
- **kwargs – Keyword arguments passed to the original - MujocoFetchReachEnvclass.
 
 - state = None
 - _action_space_dtype
 - _observation_space_dtype
 - _action_dtype_conversion_warning = False
 - action_space
 - step(action)[source]
- Take step into the environment. - Note - This method overrides the - step()method such that the new cost function is used.- Parameters:
- action (np.ndarray) – Action to take in the environment. 
- Returns:
- tuple containing: - obs ( - np.ndarray): Environment observation.
- cost ( - float): Cost of the action.
- terminated ( - bool): Whether the episode is terminated.
- truncated ( - bool): Whether the episode was truncated. This value is set by wrappers when for example a time limit is reached or the agent goes out of bounds.
- info ( - dict): Additional information about the environment.
 
- Return type:
- (tuple) 
 
 - reset(seed=None, options=None)[source]
- Reset gymnasium environment. - Parameters:
- Returns:
- tuple containing: - obs ( - numpy.ndarray): Initial environment observation.
- info ( - dict): Dictionary containing additional information.
 
- Return type:
- (tuple) 
 
 - property tau
- Alias for the environment step size. Done for compatibility with the
- other gymnasium environments.
 - property t
- Environment time.
 - property physics_time
- Returns the physics time.
 
- class stable_gym.envs.robotics.MinitaurBulletCost(reference_forward_velocity=1.0, randomise_reference_forward_velocity=False, randomise_reference_forward_velocity_range=(0.5, 1.5), forward_velocity_weight=1.0, include_energy_cost=False, energy_weight=0.005, include_shake_cost=False, shake_weight=0.01, include_drift_cost=False, drift_weight=0.01, distance_limit=float('inf'), render=False, include_health_penalty=True, health_penalty_size=None, backward_velocity_bound=-0.5, fall_criteria_up_rotation=0.85, fall_criteria_z_position=0.13, exclude_reference_from_observation=False, exclude_reference_error_from_observation=True, exclude_x_velocity_from_observation=False, action_space_dtype=np.float32, observation_space_dtype=np.float64, **kwargs)[source]
- Bases: - pybullet_envs.bullet.minitaur_gym_env.MinitaurBulletEnv,- gymnasium.utils.EzPickle- Custom Minitaur Bullet gymnasium environment. - Note - Can also be used in a vectorized manner. See the gym.vector documentation. - Source:
- Modified version of the Minitaur environment found in the pybullet package. This modification was first described by Han et al. 2020. In this modified version: - The objective was changed to a velocity-tracking task. To do this, the reward is replaced with a cost. This cost is the squared difference between the Minitaur’s forward velocity and a reference value (error). Additionally, also a energy cost and health penalty can be included in the cost. 
- A minimal backward velocity bound is added to prevent the Minitaur from walking backwards. 
- Users are given the option to modify the Minitaur fall criteria, and thus the episode termination criteria. 
 - The rest of the environment is the same as the original Minitaur environment. Please refer to the original codebase or the article of Tan et al. 2018 on which the Minitaur environment is based for more information. - Important - In Han et al. 2020, the authors disabled the termination criteria. In our implementation, we have kept them for consistency with the original Minitaur environment. The termination criteria can be enabled by setting the :arg:`fall_criteria_up_rotation` and :arg:`fall_criteria_z_position` to - np.inf.
 - Observation:
- Type: Box(28) - Contains angles, velocities, and torques of all motors. Optionally, it can also include the reference, reference error, and x velocity. 
- Actions:
- Type: Box(8) - A list of desired motor angles for eight motors. 
- Modified cost:
- A cost, computed using the - MinitaurBulletCost.cost()method, is given for each simulation step, including the terminal step. This cost is defined as the error between the Minitaur’s forward velocity and a reference value. A control cost and health penalty can also be included in the cost. This health penalty equals the- max_episode_stepsminus the number of steps taken in the episode or a fixed value. The cost is computed as:\[cost = w_{forward\_velocity} \times (x_{velocity} - x_{reference\_x\_velocity})^2 + w_{ctrl} \times c_{ctrl} + p_{health}\]
- Starting State:
- The robot always starts at the same position and orientation, with zero velocity. 
- Episode Termination:
- The episode is terminated if the Minitaur falls, meaning that the the orientation between the base and the world is greater than a threshold or the base is too close to the ground. 
- Optionally, the episode can be terminated if the Minitaur walks backwards. 
 
- Solved Requirements:
- Considered solved when the average cost is less than or equal to 50 over 100 consecutive trials. 
- How to use:
- import stable_gym import gymnasium as gym env = gym.make("stable_gym:MinitaurBulletCost-v1") 
 - state
- The current system state. - Type:
 
 - Attention - Since the - MinitaurBulletEnv()is not yet compatible with gymnasium v>=0.26.0, the- gym.wrappers.EnvCompatibilitywrapper is used. This has the side effect that the- render_modeargument is not working. Instead, the- renderargument should be used.- Initialise a new MinitaurBulletCost environment instance. - Parameters:
- reference_forward_velocity (float, optional) – The forward velocity that the agent should try to track. Defaults to - 1.0.
- randomise_reference_forward_velocity (bool, optional) – Whether to randomize the reference forward velocity. Defaults to - False.
- randomise_reference_forward_velocity_range (tuple, optional) – The range of the random reference forward velocity. Defaults to - (0.5, 1.5).
- forward_velocity_weight (float, optional) – The weight used to scale the forward velocity error. Defaults to - 1.0.
- include_energy_cost (bool, optional) – Whether to include the energy cost in the cost function (i.e. energy of the motors). Defaults to - False.
- energy_weight (float, optional) – The weight used to scale the energy cost. Defaults to - 0.005.
- include_shake_cost (bool, optional) – Whether to include the shake cost in the cost function (i.e. moving up and down). Defaults to - False.
- shake_weight (float, optional) – The weight used to scale the shake cost. Defaults to - 0.01.
- include_drift_cost (bool, optional) – Whether to include the drift cost in the cost function (i.e. movement in the y direction). Defaults to - False.
- drift_weight (float, optional) – The weight used to scale the drift cost. Defaults to - 0.01.
- distance_limit (float, optional) – The max distance (in meters) that the agent can travel before the episode is terminated. Defaults to - float("inf").
- render (bool, optional) – Whether to render the environment. Defaults to - False.
- include_health_penalty (bool, optional) – Whether to penalize the Minitaur if it becomes unhealthy (i.e. if it falls over). Defaults to - True.
- health_penalty_size (int, optional) – The size of the unhealthy penalty. Defaults to - None. Meaning the penalty is equal to the max episode steps and the steps taken.
- backward_velocity_bound (float) – The max backward velocity (in meters per second) before the episode is terminated. Defaults to - -0.5.
- fall_criteria_up_rotation (float) – The max up rotation (in radians) between the base and the world before the episode is terminated. Defaults to - 0.85.
- fall_criteria_z_position (float) – The max z position (in meters) before the episode is terminated. Defaults to - 0.13.
- exclude_reference_from_observation (bool, optional) – Whether the reference should be excluded from the observation. Defaults to - False.
- exclude_reference_error_from_observation (bool, optional) – Whether the error should be excluded from the observation. Defaults to - True.
- exclude_x_velocity_from_observation (bool, optional) – Whether to omit the x- component of the velocity from observations. Defaults to - False.
- action_space_dtype (union[numpy.dtype, str], optional) – The data type of the action space. Defaults to - np.float32.
- observation_space_dtype (union[numpy.dtype, str], optional) – The data type of the observation space. Defaults to - np.float64.
- **kwargs – Extra keyword arguments to pass to the - MinitaurBulletEnvclass.
 
 - metadata
 - state = None
 - t = 0.0
 - reference_forward_velocity
 - _randomise_reference_forward_velocity
 - _randomise_reference_forward_velocity_range
 - _forward_velocity_weight
 - _include_energy_cost
 - _energy_weight
 - _include_shake_cost
 - _shake_weight
 - _include_drift_cost
 - _drift_weight
 - _include_health_penalty
 - _health_penalty_size
 - _backward_velocity_bound
 - _fall_criteria_up_rotation
 - _fall_criteria_z_position
 - _exclude_reference_from_observation
 - _exclude_reference_error_from_observation
 - _exclude_x_velocity_from_observation
 - _action_space_dtype
 - _observation_space_dtype
 - _action_dtype_conversion_warning = False
 - observation_space
 - action_space
 - low
 - high
 - cost(x_velocity, energy_cost, drift_cost, shake_cost)[source]
- Compute the cost of a given base x velocity, energy cost, shake cost and drift cost. 
 - step(action)[source]
- Take step into the environment. - Note - This method overrides the - step()method such that the new cost function is used.- Parameters:
- action (np.ndarray) – Action to take in the environment. 
- render_mode (str, optional) – The render mode to use. Defaults to - None.
 
- Returns:
- tuple containing: - obs ( - np.ndarray): Environment observation.
- cost ( - float): Cost of the action.
- terminated ( - bool): Whether the episode is terminated.
- truncated ( - bool): Whether the episode was truncated. This value is set by wrappers when for example a time limit is reached or the agent goes out of bounds.
- info ( - dict): Additional information about the environment.
 
- Return type:
- (tuple) 
 
 - reset()[source]
- Reset gymnasium environment. - Returns:
- Initial environment observation. 
- Return type:
- (np.ndarray) 
 
 - _termination()[source]
- Check whether the episode is terminated. - Note - This method overrides the - _termination()method of the original Minitaur environment so that we can also set a minimum velocity criteria.- Returns:
- Boolean value that indicates whether the episode is terminated. 
- Return type:
- (bool) 
 
 - is_fallen()[source]
- Check whether the minitaur has fallen. - If the up directions (i.e. angle) between the base and the world are larger (the dot product is smaller than - _fall_criteria_up_rotation) or the base is close to the ground (the height is smaller than- _fall_criteria_z_position), the minitaur is considered fallen.- Note - This method overrides the - is_fallen()method of the original Minitaur environment to give users the ability to set the fall criteria.- Returns:
- Boolean value that indicates whether the minitaur has fallen. 
- Return type:
- (bool) 
 
 - property time_limit_max_episode_steps
- The maximum number of steps that the environment can take before it is
- truncated by the :class:`gymnasium.wrappers.TimeLimit` wrapper.
 - property base_velocity
- The base velocity of the minitaur.
 - property dt
- The environment step size.
 - property tau
- Alias for the environment step size. Done for compatibility with the
- other gymnasium environments.
 - property physics_time
- Returns the physics time.
- Note - The Minitaur uses 100 steps to setup the system. This is why we add 100 time steps. 
 
- class stable_gym.envs.robotics.QuadXHoverCost(flight_dome_size=3.0, angle_representation='quaternion', agent_hz=40, render_mode=None, render_resolution=(480, 480), include_health_penalty=True, health_penalty_size=None, action_space_dtype=np.float64, observation_space_dtype=np.float64, **kwargs)[source]
- Bases: - PyFlyt.gym_envs.quadx_envs.quadx_hover_env.QuadXHoverEnv,- gymnasium.utils.EzPickle- Custom QuadXHover Bullet gymnasium environment. - Note - Can also be used in a vectorized manner. See the gym.vector documentation. - Source:
- Modified version of the QuadXHover environment found in the PyFlyt package. This environment was first described by Tai et al. 2023. In this modified version: - The reward has been changed to a cost. This was done by negating the reward always to be positive definite. 
- A health penalty has been added. This penalty is applied when the quadrotor moves outside the flight dome or crashes. The penalty equals the maximum episode steps minus the steps taken or a user-defined penalty. 
- The - max_duration_secondshas been removed. Instead, the- max_episode_stepsparameter of the- gym.wrappers.TimeLimitwrapper is used to limit the episode duration.
 - The rest of the environment is the same as the original QuadXHover environment. Please refer to the original codebase, the PyFlyt documentation or the accompanying article of Tai et al. 2023 for more information. 
 - Modified cost:
- A cost, computed using the - QuadXHoverCost.cost()method, is given for each simulation step, including the terminal step. This cost is defined as the Euclidean distance error between the quadrotors’ current position and a desired hover position (i.e. \(p=x_{x,y,z}=[0,0,1]\)) and the error between the quadrotors’ current angular roll and pitch and their zero values. A health penalty can also be included in the cost. This health penalty is added when the drone leaves the flight dome or crashes. It equals the- max_episode_stepsminus the number of steps taken in the episode or a fixed value. The cost is computed as:\[cost = \| p_{drone} - p_{hover} \| + \| \theta_{roll,pitch} \| + p_{health}\]
- Solved Requirements:
- Considered solved when the average cost is less than or equal to 50 over 100 consecutive trials. 
- How to use:
- import stable_gym import gymnasium as gym env = gym.make("stable_gym:QuadXHoverCost-v1") 
 - state
- The current system state. - Type:
 
 - initial_physics_time
- The simulation startup time. The physics time at the start of the episode after all the initialisation has been done. - Type:
 
 - Initialise a new QuadXHoverCost environment instance. - Parameters:
- flight_dome_size (float, optional) – Size of the allowable flying area. By default - 3.0.
- angle_representation (str, optional) – The angle representation to use. Can be - "euler"or- "quaternion". By default- "quaternion".
- agent_hz (int, optional) – Looprate of the agent to environment interaction. By default - 40.
- render_mode (None | str, optional) – The render mode. Can be - "human"or- None. By default- None.
- render_resolution (tuple[int, int], optional) – The render resolution. By default - (480, 480).
- include_health_penalty (bool, optional) – Whether to penalize the quadrotor if it becomes unhealthy (i.e. if it falls over). Defaults to - True.
- health_penalty_size (int, optional) – The size of the unhealthy penalty. Defaults to - None. Meaning the penalty is equal to the max episode steps and the steps taken.
- action_space_dtype (union[numpy.dtype, str], optional) – The data type of the action space. Defaults to - np.float64.
- observation_space_dtype (union[numpy.dtype, str], optional) – The data type of the observation space. Defaults to - np.float64.
- **kwargs – Additional keyword arguments passed to the - QuadXHoverEnv
 
 - state = None
 - initial_physics_time = None
 - _max_episode_steps_applied = False
 - agent_hz
 - _include_health_penalty
 - _health_penalty_size
 - _action_space_dtype
 - _observation_space_dtype
 - _action_dtype_conversion_warning = False
 - cost()[source]
- Compute the cost of the current state. - Returns:
- tuple containing: 
- Return type:
- (tuple) 
 
 - step(action)[source]
- Take step into the environment. - Note - This method overrides the - step()method such that the new cost function is used.- Parameters:
- action (np.ndarray) – Action to take in the environment. 
- Returns:
- tuple containing: - obs ( - np.ndarray): Environment observation.
- cost ( - float): Cost of the action.
- terminated ( - bool): Whether the episode is terminated.
- truncated ( - bool): Whether the episode was truncated. This value is set by wrappers when for example a time limit is reached or the agent goes out of bounds.
- info ( - dict): Additional information about the environment.
 
- Return type:
- (tuple) 
 
 - reset(seed=None, options=None)[source]
- Reset gymnasium environment. - Parameters:
- Returns:
- tuple containing: - obs ( - numpy.ndarray): Initial environment observation.
- info ( - dict): Dictionary containing additional information.
 
- Return type:
- (tuple) 
 
 - property time_limit_max_episode_steps
- The maximum number of steps that the environment can take before it is
- truncated by the :class:`gymnasium.wrappers.TimeLimit` wrapper.
 - property time_limit
- The maximum duration of the episode in seconds.
 - property dt
- The environment step size.
- Returns:
- The simulation step size. Returns Noneif the environment is
- not yet initialized. 
 
- The simulation step size. Returns 
- Return type:
- (float) 
 
 - property tau
- Alias for the environment step size. Done for compatibility with the
- other gymnasium environments.
- Returns:
- The simulation step size. Returns Noneif the environment is
- not yet initialized. 
 
- The simulation step size. Returns 
- Return type:
- (float) 
 
 - property t
- Environment time.
 - property physics_time
- Returns the physics time.
 
- class stable_gym.envs.robotics.QuadXTrackingCost(flight_dome_size=3.0, angle_representation='quaternion', agent_hz=40, render_mode=None, render_resolution=(480, 480), reference_target_position=(0.0, 0.0, 1.0), reference_amplitude=(1.0, 1.0, 0.25), reference_frequency=(0.25, 0.25, 0.1), reference_phase_shift=(0.0, -np.pi / 2.0, 0.0), include_health_penalty=True, health_penalty_size=None, exclude_reference_from_observation=False, exclude_reference_error_from_observation=True, action_space_dtype=np.float64, observation_space_dtype=np.float64, **kwargs)[source]
- Bases: - PyFlyt.gym_envs.quadx_envs.quadx_hover_env.QuadXHoverEnv,- gymnasium.utils.EzPickle- Custom QuadX Bullet gymnasium environment. - Note - Can also be used in a vectorized manner. See the gym.vector documentation. - Source:
- Modified version of the QuadXHover environment found in the PyFlyt package. Compared to the original environment: - The reward has been changed to a cost. This was done by negating the reward always to be positive definite. 
- A health penalty has been added. This penalty is applied when the quadrotor moves outside the flight dome or crashes. The penalty equals the maximum episode steps minus the steps taken or a user-defined penalty. 
- The - max_duration_secondshas been removed. Instead, the- max_episode_stepsparameter of the- gym.wrappers.TimeLimitwrapper is used to limit the episode duration.
- The objective has been changed to track a periodic reference trajectory. 
- The info dictionary has been extended with the reference, state of interest (i.e. the state to track) and reference error. 
 - The rest of the environment is the same as the original QuadXHover environment. Please refer to the original codebase, the PyFlyt documentation or the accompanying article of Tai et al. 2023 for more information. 
 - Modified cost:
- A cost, computed using the - QuadXTrackingCost.cost()method, is given for each simulation step, including the terminal step. This cost is defined as the Euclidean distance error between the quadrotors’ current position and a desired reference position (i.e. \(p=x_{x,y,z}=[0,0,1]\)). A health penalty can also be included in the cost. This health penalty is added when the drone leaves the flight dome or crashes. It equals the- max_episode_stepsminus the number of steps taken in the episode or a fixed value. The cost is computed as:\[cost = \| p_{drone} - p_{reference} \| + p_{health}\]
- Solved Requirements:
- Considered solved when the average cost is less than or equal to 50 over 100 consecutive trials. 
- How to use:
- import stable_gym import gymnasium as gym env = gym.make("stable_gym:QuadXTrackingCost-v1") 
 - state
- The current system state. - Type:
 
 - initial_physics_time
- The simulation startup time. The physics time at the start of the episode after all the initialisation has been done. - Type:
 
 - Initialise a new QuadXTrackingCost environment instance. - Parameters:
- flight_dome_size (float, optional) – Size of the allowable flying area. By default - 3.0.
- angle_representation (str, optional) – The angle representation to use. Can be - "euler"or- "quaternion". By default- "quaternion".
- agent_hz (int, optional) – Looprate of the agent to environment interaction. By default - 40.
- render_mode (None | str, optional) – The render mode. Can be - "human"or- None. By default- None.
- render_resolution (tuple[int, int], optional) – The render resolution. By default - (480, 480).
- reference_target_position (tuple[float, float, float], optional) – The target position of the reference. Defaults to - (0.0, 0.0, 1.0).
- reference_amplitude (tuple[float, float, float], optional) – The amplitude of the reference. Defaults to - (1.0, 1.0, 0.25).
- reference_frequency (tuple[float, float, float], optional) – The frequency of the reference. Defaults to - (0.25, 0.25, 0.10).
- reference_phase_shift (tuple[float, float, float], optional) – The phase shift of the reference. Defaults to - (0.0, -np.pi / 2, 0.0).
- include_health_penalty (bool, optional) – Whether to penalize the quadrotor if it becomes unhealthy (i.e. if it falls over). Defaults to - True.
- health_penalty_size (int, optional) – The size of the unhealthy penalty. Defaults to - None. Meaning the penalty is equal to the max episode steps and the steps taken.
- exclude_reference_from_observation (bool, optional) – Whether the reference should be excluded from the observation. Defaults to - False.
- exclude_reference_error_from_observation (bool, optional) – Whether the error should be excluded from the observation. Defaults to - True.
- action_space_dtype (union[numpy.dtype, str], optional) – The data type of the action space. Defaults to - np.float64.
- observation_space_dtype (union[numpy.dtype, str], optional) – The data type of the observation space. Defaults to - np.float64.
- **kwargs – Additional keyword arguments passed to the - QuadXHoverEnv
 
 - reference_target_position
 - reference_amplitude
 - reference_frequency
 - reference_phase_shift
 - state = None
 - initial_physics_time = None
 - _max_episode_steps_applied = False
 - agent_hz
 - _reference_target_pos
 - _reference_amplitude
 - _reference_frequency
 - _reference_phase_shift
 - _include_health_penalty
 - _health_penalty_size
 - _exclude_reference_from_observation
 - _exclude_reference_error_from_observation
 - _action_space_dtype
 - _observation_space_dtype
 - _action_dtype_conversion_warning = False
 - PyFlyt_dir
 - _reference_obj_dir
 - _reference_visual = None
 - low
 - high
 - observation_space
- ENVIRONMENT CONSTANTS 
 - reference(t)[source]
- Returns the current value of the (periodic) drone (x, y, z) reference position that should be tracked. 
 - step(action)[source]
- Take step into the environment. - Note - This method overrides the - step()method such that the new cost function is used.- Parameters:
- action (np.ndarray) – Action to take in the environment. 
- Returns:
- tuple containing: - obs ( - np.ndarray): Environment observation.
- cost ( - float): Cost of the action.
- terminated ( - bool): Whether the episode is terminated.
- truncated ( - bool): Whether the episode was truncated. This value is set by wrappers when for example a time limit is reached or the agent goes out of bounds.
- info ( - dict): Additional information about the environment.
 
- Return type:
- (tuple) 
 
 - reset(seed=None, options=None)[source]
- Reset gymnasium environment. - Parameters:
- Returns:
- tuple containing: - obs ( - numpy.ndarray): Initial environment observation.
- info ( - dict): Dictionary containing additional information.
 
- Return type:
- (tuple) 
 
 - property time_limit_max_episode_steps
- The maximum number of steps that the environment can take before it is
- truncated by the :class:`gymnasium.wrappers.TimeLimit` wrapper.
 - property time_limit
- The maximum duration of the episode in seconds.
 - property dt
- The environment step size.
- Returns:
- The simulation step size. Returns Noneif the environment is
- not yet initialized. 
 
- The simulation step size. Returns 
- Return type:
- (float) 
 
 - property tau
- Alias for the environment step size. Done for compatibility with the
- other gymnasium environments.
- Returns:
- The simulation step size. Returns Noneif the environment is
- not yet initialized. 
 
- The simulation step size. Returns 
- Return type:
- (float) 
 
 - property t
- Environment time.
 - property physics_time
- Returns the physics time.
 
- class stable_gym.envs.robotics.QuadXWaypointsCost(num_targets=4, use_yaw_targets=False, goal_reach_distance=0.2, goal_reach_angle=0.1, flight_dome_size=5.0, angle_representation='quaternion', agent_hz=30, render_mode=None, render_resolution=(480, 480), include_health_penalty=True, health_penalty_size=None, exclude_waypoint_targets_from_observation=False, only_observe_immediate_waypoint=True, exclude_waypoint_target_deltas_from_observation=True, only_observe_immediate_waypoint_target_delta=True, action_space_dtype=np.float64, observation_space_dtype=np.float64, **kwargs)[source]
- Bases: - PyFlyt.gym_envs.quadx_envs.quadx_waypoints_env.QuadXWaypointsEnv,- gymnasium.utils.EzPickle- Custom QuadXWaypoints Bullet gymnasium environment. - Note - Can also be used in a vectorized manner. See the gym.vector documentation. - Source:
- Modified version of the QuadXWaypoints environment found in the PyFlyt package. This environment was first described by Tai et al. 2023. In this modified version: - The reward has been changed to a cost. This was done by negating the reward always to be positive definite. 
- A health penalty has been added. This penalty is applied when the quadrotor moves outside the flight dome or crashes. The penalty equals the maximum episode steps minus the steps taken or a user-defined penalty. 
- The - max_duration_secondshas been removed. Instead, the- max_episode_stepsparameter of the- gym.wrappers.TimeLimitwrapper is used to limit the episode duration.
 - The rest of the environment is the same as the original QuadXWaypoints environment. Please refer to the original codebase, the PyFlyt documentation or the accompanying article of Tai et al. 2023 for more information. 
 - Modified cost:
- A cost, computed using the - QuadXWaypointsCost.cost()method, is given for each simulation step, including the terminal step. This cost is defined as the Euclidean error between the quadrotors’ current position and the position of the current waypoint (i.e. \(p=x_{x,y,z}=[0,0,1]\)). Additionally, a penalty is given for moving away from the waypoint, and a health penalty can also be included in the cost. This health penalty is added when the drone leaves the flight dome or crashes. It equals the- max_episode_stepsminus the number of steps taken in the episode or a fixed value. The cost is computed as:\[cost = 10 \times \| p_{drone} - p_{waypoint} \| - \min(3.0 \times (p_{old} - p_{drone}), 0.0) + p_{health}\]
- Solved Requirements:
- Considered solved when the average cost is less than or equal to 50 over 100 consecutive trials. 
- How to use:
- import stable_gym import gymnasium as gym env = gym.make("stable_gym:QuadXWaypointsCost-v1") 
 - state
- The current system state. - Type:
 
 - initial_physics_time
- The simulation startup time. The physics time at the start of the episode after all the initialisation has been done. - Type:
 
 - Initialise a new QuadXWaypointsCost environment instance. - Parameters:
- num_targets (int, optional) – Number of waypoints in the environment. By default - 4.
- use_yaw_targets (bool, optional) – Whether to match yaw targets before a waypoint is considered reached. By default - False.
- goal_reach_distance (float, optional) – Distance to the waypoints for it to be considered reached. By default - 0.2.
- goal_reach_angle (float, optional) – Angle in radians to the waypoints for it to be considered reached, only in effect if - use_yaw_targetsis used. By default- 0.1.
- flight_dome_size (float, optional) – Size of the allowable flying area. By default - 5.0.
- angle_representation (str, optional) – The angle representation to use. Can be - "euler"or- "quaternion". By default- "quaternion".
- agent_hz (int, optional) – Looprate of the agent to environment interaction. By default - 30.
- render_mode (None | str, optional) – The render mode. Can be - "human"or- None. By default- None.
- render_resolution (tuple[int, int], optional) – The render resolution. By default - (480, 480).
- include_health_penalty (bool, optional) – Whether to penalize the quadrotor if it becomes unhealthy (i.e. if it falls over). Defaults to - True.
- health_penalty_size (int, optional) – The size of the unhealthy penalty. Defaults to - None. Meaning the penalty is equal to the max episode steps and the steps taken.
- exclude_waypoint_targets_from_observation (bool, optional) – Whether to exclude the waypoint targets from the observation. Defaults to - False.
- only_observe_immediate_waypoint (bool, optional) – Whether to only observe the immediate waypoint target. Defaults to - True.
- exclude_waypoint_target_deltas_from_observation (bool, optional) – Whether to exclude the waypoint target deltas from the observation. Defaults to - True.
- only_observe_immediate_waypoint_target_delta (bool, optional) – Whether to only observe the immediate waypoint target delta. Defaults to - True.
- action_space_dtype (union[numpy.dtype, str], optional) – The data type of the action space. Defaults to - np.float64.
- observation_space_dtype (union[numpy.dtype, str], optional) – The data type of the observation space. Defaults to - np.float64.
- **kwargs – Additional keyword arguments passed to the - QuadXWaypointsEnv
 
 - state = None
 - initial_physics_time = None
 - _max_episode_steps_applied = False
 - _previous_num_targets_reached = 0
 - _episode_waypoint_targets = None
 - _current_immediate_waypoint_target = None
 - agent_hz
 - _include_health_penalty
 - _health_penalty_size
 - _exclude_waypoint_targets_from_observation
 - _only_observe_immediate_waypoint
 - _exclude_waypoint_target_deltas_from_observation
 - _only_observe_immediate_waypoint_target_delta
 - _action_space_dtype
 - _observation_space_dtype
 - _action_dtype_conversion_warning = False
 - low
 - high
 - observation_space
- ENVIRONMENT CONSTANTS 
 - compute_target_deltas(ang_pos, lin_pos, quarternion)[source]
- Compute the waypoints target deltas. - Note - Needed because the ~PyFlyt.gym_envs.quadx_envs.quadx_waypoints_env.QuadXWaypointsEnv removes the immediate waypoint from the waypoint targets list when it is reached and doesn’t expose the old value. - Parameters:
- ang_pos (np.ndarray) – The current angular position. 
- lin_pos (np.ndarray) – The current position. 
- quarternion (np.ndarray) – The current quarternion. 
 
- Returns:
- The waypoints target deltas. 
- Return type:
- (np.ndarray) 
 
 - step(action)[source]
- Take step into the environment. - Note - This method overrides the - step()method such that the new cost function is used.- Parameters:
- action (np.ndarray) – Action to take in the environment. 
- Returns:
- tuple containing: - obs ( - np.ndarray): Environment observation.
- cost ( - float): Cost of the action.
- terminated ( - bool): Whether the episode is terminated.
- truncated ( - bool): Whether the episode was truncated. This value is set by wrappers when for example a time limit is reached or the agent goes out of bounds.
- info ( - dict): Additional information about the environment.
 
- Return type:
- (tuple) 
 
 - reset(seed=None, options=None)[source]
- Reset gymnasium environment. - Parameters:
- Returns:
- tuple containing: - obs ( - numpy.ndarray): Initial environment observation.
- info ( - dict): Dictionary containing additional information.
 
- Return type:
- (tuple) 
 
 - property immediate_waypoint_target
- The immediate waypoint target.
 - property time_limit_max_episode_steps
- The maximum number of steps that the environment can take before it is
- truncated by the :class:`gymnasium.wrappers.TimeLimit` wrapper.
 - property time_limit
- The maximum duration of the episode in seconds.
 - property dt
- The environment step size.
- Returns:
- The simulation step size. Returns Noneif the environment is
- not yet initialized. 
 
- The simulation step size. Returns 
- Return type:
- (float) 
 
 - property tau
- Alias for the environment step size. Done for compatibility with the
- other gymnasium environments.
- Returns:
- The simulation step size. Returns Noneif the environment is
- not yet initialized. 
 
- The simulation step size. Returns 
- Return type:
- (float) 
 
 - property t
- Environment time.
 - property physics_time
- Returns the physics time.