stable_gym.envs.robotics.quadrotor

Stable Gym gymnasium environments that are based on environments found in the PyFlyt package.

Subpackages

Classes

QuadXHoverCost

Custom QuadXHover Bullet gymnasium environment.

QuadXTrackingCost

Custom QuadX Bullet gymnasium environment.

QuadXWaypointsCost

Custom QuadXWaypoints Bullet gymnasium environment.

Package Contents

class stable_gym.envs.robotics.quadrotor.QuadXHoverCost(flight_dome_size=3.0, angle_representation='quaternion', agent_hz=40, render_mode=None, render_resolution=(480, 480), include_health_penalty=True, health_penalty_size=None, action_space_dtype=np.float64, observation_space_dtype=np.float64, **kwargs)[source]

Bases: PyFlyt.gym_envs.quadx_envs.quadx_hover_env.QuadXHoverEnv, gymnasium.utils.EzPickle

Custom QuadXHover Bullet gymnasium environment.

Note

Can also be used in a vectorized manner. See the gym.vector documentation.

Source:

Modified version of the QuadXHover environment found in the PyFlyt package. This environment was first described by Tai et al. 2023. In this modified version:

  • The reward has been changed to a cost. This was done by negating the reward always to be positive definite.

  • A health penalty has been added. This penalty is applied when the quadrotor moves outside the flight dome or crashes. The penalty equals the maximum episode steps minus the steps taken or a user-defined penalty.

  • The max_duration_seconds has been removed. Instead, the max_episode_steps parameter of the gym.wrappers.TimeLimit wrapper is used to limit the episode duration.

The rest of the environment is the same as the original QuadXHover environment. Please refer to the original codebase, the PyFlyt documentation or the accompanying article of Tai et al. 2023 for more information.

Modified cost:

A cost, computed using the QuadXHoverCost.cost() method, is given for each simulation step, including the terminal step. This cost is defined as the Euclidean distance error between the quadrotors’ current position and a desired hover position (i.e. \(p=x_{x,y,z}=[0,0,1]\)) and the error between the quadrotors’ current angular roll and pitch and their zero values. A health penalty can also be included in the cost. This health penalty is added when the drone leaves the flight dome or crashes. It equals the max_episode_steps minus the number of steps taken in the episode or a fixed value. The cost is computed as:

\[cost = \| p_{drone} - p_{hover} \| + \| \theta_{roll,pitch} \| + p_{health}\]
Solved Requirements:

Considered solved when the average cost is less than or equal to 50 over 100 consecutive trials.

How to use:
import stable_gym
import gymnasium as gym
env = gym.make("stable_gym:QuadXHoverCost-v1")
state

The current system state.

Type:

numpy.ndarray

agent_hz

The agent looprate.

Type:

int

initial_physics_time

The simulation startup time. The physics time at the start of the episode after all the initialisation has been done.

Type:

float

Initialise a new QuadXHoverCost environment instance.

Parameters:
  • flight_dome_size (float, optional) – Size of the allowable flying area. By default 3.0.

  • angle_representation (str, optional) – The angle representation to use. Can be "euler" or "quaternion". By default "quaternion".

  • agent_hz (int, optional) – Looprate of the agent to environment interaction. By default 40.

  • render_mode (None | str, optional) – The render mode. Can be "human" or None. By default None.

  • render_resolution (tuple[int, int], optional) – The render resolution. By default (480, 480).

  • include_health_penalty (bool, optional) – Whether to penalize the quadrotor if it becomes unhealthy (i.e. if it falls over). Defaults to True.

  • health_penalty_size (int, optional) – The size of the unhealthy penalty. Defaults to None. Meaning the penalty is equal to the max episode steps and the steps taken.

  • action_space_dtype (union[numpy.dtype, str], optional) – The data type of the action space. Defaults to np.float64.

  • observation_space_dtype (union[numpy.dtype, str], optional) – The data type of the observation space. Defaults to np.float64.

  • **kwargs – Additional keyword arguments passed to the QuadXHoverEnv

state = None
initial_physics_time = None
_max_episode_steps_applied = False
agent_hz
_include_health_penalty
_health_penalty_size
_action_space_dtype
_observation_space_dtype
_action_dtype_conversion_warning = False
cost()[source]

Compute the cost of the current state.

Returns:

tuple containing:

  • cost (float): The cost.

  • info (dict): Dictionary containing additional information about the cost.

Return type:

(tuple)

step(action)[source]

Take step into the environment.

Note

This method overrides the step() method such that the new cost function is used.

Parameters:

action (np.ndarray) – Action to take in the environment.

Returns:

tuple containing:

  • obs (np.ndarray): Environment observation.

  • cost (float): Cost of the action.

  • terminated (bool): Whether the episode is terminated.

  • truncated (bool): Whether the episode was truncated. This value is set by wrappers when for example a time limit is reached or the agent goes out of bounds.

  • info (dict): Additional information about the environment.

Return type:

(tuple)

reset(seed=None, options=None)[source]

Reset gymnasium environment.

Parameters:
  • seed (int, optional) – A random seed for the environment. By default None.

  • options (dict, optional) – A dictionary containing additional options for resetting the environment. By default None. Not used in this environment.

Returns:

tuple containing:

  • obs (numpy.ndarray): Initial environment observation.

  • info (dict): Dictionary containing additional information.

Return type:

(tuple)

property time_limit_max_episode_steps
The maximum number of steps that the environment can take before it is
truncated by the :class:`gymnasium.wrappers.TimeLimit` wrapper.
property time_limit
The maximum duration of the episode in seconds.
property dt
The environment step size.
Returns:

The simulation step size. Returns None if the environment is

not yet initialized.

Return type:

(float)

property tau
Alias for the environment step size. Done for compatibility with the
other gymnasium environments.
Returns:

The simulation step size. Returns None if the environment is

not yet initialized.

Return type:

(float)

property t
Environment time.
property physics_time
Returns the physics time.
class stable_gym.envs.robotics.quadrotor.QuadXTrackingCost(flight_dome_size=3.0, angle_representation='quaternion', agent_hz=40, render_mode=None, render_resolution=(480, 480), reference_target_position=(0.0, 0.0, 1.0), reference_amplitude=(1.0, 1.0, 0.25), reference_frequency=(0.25, 0.25, 0.1), reference_phase_shift=(0.0, -np.pi / 2.0, 0.0), include_health_penalty=True, health_penalty_size=None, exclude_reference_from_observation=False, exclude_reference_error_from_observation=True, action_space_dtype=np.float64, observation_space_dtype=np.float64, **kwargs)[source]

Bases: PyFlyt.gym_envs.quadx_envs.quadx_hover_env.QuadXHoverEnv, gymnasium.utils.EzPickle

Custom QuadX Bullet gymnasium environment.

Note

Can also be used in a vectorized manner. See the gym.vector documentation.

Source:

Modified version of the QuadXHover environment found in the PyFlyt package. Compared to the original environment:

  • The reward has been changed to a cost. This was done by negating the reward always to be positive definite.

  • A health penalty has been added. This penalty is applied when the quadrotor moves outside the flight dome or crashes. The penalty equals the maximum episode steps minus the steps taken or a user-defined penalty.

  • The max_duration_seconds has been removed. Instead, the max_episode_steps parameter of the gym.wrappers.TimeLimit wrapper is used to limit the episode duration.

  • The objective has been changed to track a periodic reference trajectory.

  • The info dictionary has been extended with the reference, state of interest (i.e. the state to track) and reference error.

The rest of the environment is the same as the original QuadXHover environment. Please refer to the original codebase, the PyFlyt documentation or the accompanying article of Tai et al. 2023 for more information.

Modified cost:

A cost, computed using the QuadXTrackingCost.cost() method, is given for each simulation step, including the terminal step. This cost is defined as the Euclidean distance error between the quadrotors’ current position and a desired reference position (i.e. \(p=x_{x,y,z}=[0,0,1]\)). A health penalty can also be included in the cost. This health penalty is added when the drone leaves the flight dome or crashes. It equals the max_episode_steps minus the number of steps taken in the episode or a fixed value. The cost is computed as:

\[cost = \| p_{drone} - p_{reference} \| + p_{health}\]
Solved Requirements:

Considered solved when the average cost is less than or equal to 50 over 100 consecutive trials.

How to use:
import stable_gym
import gymnasium as gym
env = gym.make("stable_gym:QuadXTrackingCost-v1")
state

The current system state.

Type:

numpy.ndarray

agent_hz

The agent looprate.

Type:

int

initial_physics_time

The simulation startup time. The physics time at the start of the episode after all the initialisation has been done.

Type:

float

Initialise a new QuadXTrackingCost environment instance.

Parameters:
  • flight_dome_size (float, optional) – Size of the allowable flying area. By default 3.0.

  • angle_representation (str, optional) – The angle representation to use. Can be "euler" or "quaternion". By default "quaternion".

  • agent_hz (int, optional) – Looprate of the agent to environment interaction. By default 40.

  • render_mode (None | str, optional) – The render mode. Can be "human" or None. By default None.

  • render_resolution (tuple[int, int], optional) – The render resolution. By default (480, 480).

  • reference_target_position (tuple[float, float, float], optional) – The target position of the reference. Defaults to (0.0, 0.0, 1.0).

  • reference_amplitude (tuple[float, float, float], optional) – The amplitude of the reference. Defaults to (1.0, 1.0, 0.25).

  • reference_frequency (tuple[float, float, float], optional) – The frequency of the reference. Defaults to (0.25, 0.25, 0.10).

  • reference_phase_shift (tuple[float, float, float], optional) – The phase shift of the reference. Defaults to (0.0, -np.pi / 2, 0.0).

  • include_health_penalty (bool, optional) – Whether to penalize the quadrotor if it becomes unhealthy (i.e. if it falls over). Defaults to True.

  • health_penalty_size (int, optional) – The size of the unhealthy penalty. Defaults to None. Meaning the penalty is equal to the max episode steps and the steps taken.

  • exclude_reference_from_observation (bool, optional) – Whether the reference should be excluded from the observation. Defaults to False.

  • exclude_reference_error_from_observation (bool, optional) – Whether the error should be excluded from the observation. Defaults to True.

  • action_space_dtype (union[numpy.dtype, str], optional) – The data type of the action space. Defaults to np.float64.

  • observation_space_dtype (union[numpy.dtype, str], optional) – The data type of the observation space. Defaults to np.float64.

  • **kwargs – Additional keyword arguments passed to the QuadXHoverEnv

reference_target_position
reference_amplitude
reference_frequency
reference_phase_shift
state = None
initial_physics_time = None
_max_episode_steps_applied = False
agent_hz
_reference_target_pos
_reference_amplitude
_reference_frequency
_reference_phase_shift
_include_health_penalty
_health_penalty_size
_exclude_reference_from_observation
_exclude_reference_error_from_observation
_action_space_dtype
_observation_space_dtype
_action_dtype_conversion_warning = False
PyFlyt_dir
_reference_obj_dir
_reference_visual = None
low
high
observation_space

ENVIRONMENT CONSTANTS

reference(t)[source]

Returns the current value of the (periodic) drone (x, y, z) reference position that should be tracked.

Parameters:

t (float) – The current time step.

Returns:

The current reference position.

Return type:

float

cost()[source]

Compute the cost of the current state.

Returns:

The cost.

Return type:

(float)

step(action)[source]

Take step into the environment.

Note

This method overrides the step() method such that the new cost function is used.

Parameters:

action (np.ndarray) – Action to take in the environment.

Returns:

tuple containing:

  • obs (np.ndarray): Environment observation.

  • cost (float): Cost of the action.

  • terminated (bool): Whether the episode is terminated.

  • truncated (bool): Whether the episode was truncated. This value is set by wrappers when for example a time limit is reached or the agent goes out of bounds.

  • info (dict): Additional information about the environment.

Return type:

(tuple)

reset(seed=None, options=None)[source]

Reset gymnasium environment.

Parameters:
  • seed (int, optional) – A random seed for the environment. By default None.

  • options (dict, optional) – A dictionary containing additional options for resetting the environment. By default None. Not used in this environment.

Returns:

tuple containing:

  • obs (numpy.ndarray): Initial environment observation.

  • info (dict): Dictionary containing additional information.

Return type:

(tuple)

visualize_reference()[source]

Visualize the reference target.

property time_limit_max_episode_steps
The maximum number of steps that the environment can take before it is
truncated by the :class:`gymnasium.wrappers.TimeLimit` wrapper.
property time_limit
The maximum duration of the episode in seconds.
property dt
The environment step size.
Returns:

The simulation step size. Returns None if the environment is

not yet initialized.

Return type:

(float)

property tau
Alias for the environment step size. Done for compatibility with the
other gymnasium environments.
Returns:

The simulation step size. Returns None if the environment is

not yet initialized.

Return type:

(float)

property t
Environment time.
property physics_time
Returns the physics time.
class stable_gym.envs.robotics.quadrotor.QuadXWaypointsCost(num_targets=4, use_yaw_targets=False, goal_reach_distance=0.2, goal_reach_angle=0.1, flight_dome_size=5.0, angle_representation='quaternion', agent_hz=30, render_mode=None, render_resolution=(480, 480), include_health_penalty=True, health_penalty_size=None, exclude_waypoint_targets_from_observation=False, only_observe_immediate_waypoint=True, exclude_waypoint_target_deltas_from_observation=True, only_observe_immediate_waypoint_target_delta=True, action_space_dtype=np.float64, observation_space_dtype=np.float64, **kwargs)[source]

Bases: PyFlyt.gym_envs.quadx_envs.quadx_waypoints_env.QuadXWaypointsEnv, gymnasium.utils.EzPickle

Custom QuadXWaypoints Bullet gymnasium environment.

Note

Can also be used in a vectorized manner. See the gym.vector documentation.

Source:

Modified version of the QuadXWaypoints environment found in the PyFlyt package. This environment was first described by Tai et al. 2023. In this modified version:

  • The reward has been changed to a cost. This was done by negating the reward always to be positive definite.

  • A health penalty has been added. This penalty is applied when the quadrotor moves outside the flight dome or crashes. The penalty equals the maximum episode steps minus the steps taken or a user-defined penalty.

  • The max_duration_seconds has been removed. Instead, the max_episode_steps parameter of the gym.wrappers.TimeLimit wrapper is used to limit the episode duration.

The rest of the environment is the same as the original QuadXWaypoints environment. Please refer to the original codebase, the PyFlyt documentation or the accompanying article of Tai et al. 2023 for more information.

Modified cost:

A cost, computed using the QuadXWaypointsCost.cost() method, is given for each simulation step, including the terminal step. This cost is defined as the Euclidean error between the quadrotors’ current position and the position of the current waypoint (i.e. \(p=x_{x,y,z}=[0,0,1]\)). Additionally, a penalty is given for moving away from the waypoint, and a health penalty can also be included in the cost. This health penalty is added when the drone leaves the flight dome or crashes. It equals the max_episode_steps minus the number of steps taken in the episode or a fixed value. The cost is computed as:

\[cost = 10 \times \| p_{drone} - p_{waypoint} \| - \min(3.0 \times (p_{old} - p_{drone}), 0.0) + p_{health}\]
Solved Requirements:

Considered solved when the average cost is less than or equal to 50 over 100 consecutive trials.

How to use:
import stable_gym
import gymnasium as gym
env = gym.make("stable_gym:QuadXWaypointsCost-v1")
state

The current system state.

Type:

numpy.ndarray

agent_hz

The agent looprate.

Type:

int

initial_physics_time

The simulation startup time. The physics time at the start of the episode after all the initialisation has been done.

Type:

float

Initialise a new QuadXWaypointsCost environment instance.

Parameters:
  • num_targets (int, optional) – Number of waypoints in the environment. By default 4.

  • use_yaw_targets (bool, optional) – Whether to match yaw targets before a waypoint is considered reached. By default False.

  • goal_reach_distance (float, optional) – Distance to the waypoints for it to be considered reached. By default 0.2.

  • goal_reach_angle (float, optional) – Angle in radians to the waypoints for it to be considered reached, only in effect if use_yaw_targets is used. By default 0.1.

  • flight_dome_size (float, optional) – Size of the allowable flying area. By default 5.0.

  • angle_representation (str, optional) – The angle representation to use. Can be "euler" or "quaternion". By default "quaternion".

  • agent_hz (int, optional) – Looprate of the agent to environment interaction. By default 30.

  • render_mode (None | str, optional) – The render mode. Can be "human" or None. By default None.

  • render_resolution (tuple[int, int], optional) – The render resolution. By default (480, 480).

  • include_health_penalty (bool, optional) – Whether to penalize the quadrotor if it becomes unhealthy (i.e. if it falls over). Defaults to True.

  • health_penalty_size (int, optional) – The size of the unhealthy penalty. Defaults to None. Meaning the penalty is equal to the max episode steps and the steps taken.

  • exclude_waypoint_targets_from_observation (bool, optional) – Whether to exclude the waypoint targets from the observation. Defaults to False.

  • only_observe_immediate_waypoint (bool, optional) – Whether to only observe the immediate waypoint target. Defaults to True.

  • exclude_waypoint_target_deltas_from_observation (bool, optional) – Whether to exclude the waypoint target deltas from the observation. Defaults to True.

  • only_observe_immediate_waypoint_target_delta (bool, optional) – Whether to only observe the immediate waypoint target delta. Defaults to True.

  • action_space_dtype (union[numpy.dtype, str], optional) – The data type of the action space. Defaults to np.float64.

  • observation_space_dtype (union[numpy.dtype, str], optional) – The data type of the observation space. Defaults to np.float64.

  • **kwargs – Additional keyword arguments passed to the QuadXWaypointsEnv

state = None
initial_physics_time = None
_max_episode_steps_applied = False
_previous_num_targets_reached = 0
_episode_waypoint_targets = None
_current_immediate_waypoint_target = None
agent_hz
_include_health_penalty
_health_penalty_size
_exclude_waypoint_targets_from_observation
_only_observe_immediate_waypoint
_exclude_waypoint_target_deltas_from_observation
_only_observe_immediate_waypoint_target_delta
_action_space_dtype
_observation_space_dtype
_action_dtype_conversion_warning = False
low
high
observation_space

ENVIRONMENT CONSTANTS

cost(env_completed, num_targets_reached)[source]

Compute the cost of the current state.

Parameters:
  • env_completed (bool) – Whether the environment is completed.

  • num_targets_reached (int) – The number of targets reached.

Returns:

tuple containing:

  • cost (float): The cost of the current state.

  • cost_info (dict): Dictionary containing additional cost

    information.

Return type:

(tuple)

compute_target_deltas(ang_pos, lin_pos, quarternion)[source]

Compute the waypoints target deltas.

Note

Needed because the ~PyFlyt.gym_envs.quadx_envs.quadx_waypoints_env.QuadXWaypointsEnv removes the immediate waypoint from the waypoint targets list when it is reached and doesn’t expose the old value.

Parameters:
  • ang_pos (np.ndarray) – The current angular position.

  • lin_pos (np.ndarray) – The current position.

  • quarternion (np.ndarray) – The current quarternion.

Returns:

The waypoints target deltas.

Return type:

(np.ndarray)

step(action)[source]

Take step into the environment.

Note

This method overrides the step() method such that the new cost function is used.

Parameters:

action (np.ndarray) – Action to take in the environment.

Returns:

tuple containing:

  • obs (np.ndarray): Environment observation.

  • cost (float): Cost of the action.

  • terminated (bool): Whether the episode is terminated.

  • truncated (bool): Whether the episode was truncated. This value is set by wrappers when for example a time limit is reached or the agent goes out of bounds.

  • info (dict): Additional information about the environment.

Return type:

(tuple)

reset(seed=None, options=None)[source]

Reset gymnasium environment.

Parameters:
  • seed (int, optional) – A random seed for the environment. By default None.

  • options (dict, optional) – A dictionary containing additional options for resetting the environment. By default None. Not used in this environment.

Returns:

tuple containing:

  • obs (numpy.ndarray): Initial environment observation.

  • info (dict): Dictionary containing additional information.

Return type:

(tuple)

property immediate_waypoint_target
The immediate waypoint target.
property time_limit_max_episode_steps
The maximum number of steps that the environment can take before it is
truncated by the :class:`gymnasium.wrappers.TimeLimit` wrapper.
property time_limit
The maximum duration of the episode in seconds.
property dt
The environment step size.
Returns:

The simulation step size. Returns None if the environment is

not yet initialized.

Return type:

(float)

property tau
Alias for the environment step size. Done for compatibility with the
other gymnasium environments.
Returns:

The simulation step size. Returns None if the environment is

not yet initialized.

Return type:

(float)

property t
Environment time.
property physics_time
Returns the physics time.