stable_gym.envs.robotics.quadrotor.quadx_waypoints_cost
Modified version of the QuadXWaypoints environment found in the PyFlyt package. This environment was first described by Tai et al. 2023. In this modified version:
The reward has been changed to a cost. This was done by negating the reward always to be positive definite.
A health penalty has been added. This penalty is applied when the quadrotor moves outside the flight dome or crashes. The penalty equals the maximum episode steps minus the steps taken or a user-defined penalty.
The
max_duration_seconds
has been removed. Instead, themax_episode_steps
parameter of thegym.wrappers.TimeLimit
wrapper is used to limit the episode duration.
The rest of the environment is the same as the original QuadXWaypoints environment. Please refer to the original codebase, the PyFlyt documentation or the accompanying` article of Tai et al. 2023`_ for more information.
Submodules
Classes
Custom QuadXWaypoints Bullet gymnasium environment. |
Package Contents
- class stable_gym.envs.robotics.quadrotor.quadx_waypoints_cost.QuadXWaypointsCost(num_targets=4, use_yaw_targets=False, goal_reach_distance=0.2, goal_reach_angle=0.1, flight_dome_size=5.0, angle_representation='quaternion', agent_hz=30, render_mode=None, render_resolution=(480, 480), include_health_penalty=True, health_penalty_size=None, exclude_waypoint_targets_from_observation=False, only_observe_immediate_waypoint=True, exclude_waypoint_target_deltas_from_observation=True, only_observe_immediate_waypoint_target_delta=True, action_space_dtype=np.float64, observation_space_dtype=np.float64, **kwargs)[source]
Bases:
PyFlyt.gym_envs.quadx_envs.quadx_waypoints_env.QuadXWaypointsEnv
,gymnasium.utils.EzPickle
Custom QuadXWaypoints Bullet gymnasium environment.
Note
Can also be used in a vectorized manner. See the gym.vector documentation.
- Source:
Modified version of the QuadXWaypoints environment found in the PyFlyt package. This environment was first described by Tai et al. 2023. In this modified version:
The reward has been changed to a cost. This was done by negating the reward always to be positive definite.
A health penalty has been added. This penalty is applied when the quadrotor moves outside the flight dome or crashes. The penalty equals the maximum episode steps minus the steps taken or a user-defined penalty.
The
max_duration_seconds
has been removed. Instead, themax_episode_steps
parameter of thegym.wrappers.TimeLimit
wrapper is used to limit the episode duration.
The rest of the environment is the same as the original QuadXWaypoints environment. Please refer to the original codebase, the PyFlyt documentation or the accompanying article of Tai et al. 2023 for more information.
- Modified cost:
A cost, computed using the
QuadXWaypointsCost.cost()
method, is given for each simulation step, including the terminal step. This cost is defined as the Euclidean error between the quadrotors’ current position and the position of the current waypoint (i.e. \(p=x_{x,y,z}=[0,0,1]\)). Additionally, a penalty is given for moving away from the waypoint, and a health penalty can also be included in the cost. This health penalty is added when the drone leaves the flight dome or crashes. It equals themax_episode_steps
minus the number of steps taken in the episode or a fixed value. The cost is computed as:\[cost = 10 \times \| p_{drone} - p_{waypoint} \| - \min(3.0 \times (p_{old} - p_{drone}), 0.0) + p_{health}\]- Solved Requirements:
Considered solved when the average cost is less than or equal to 50 over 100 consecutive trials.
- How to use:
import stable_gym import gymnasium as gym env = gym.make("stable_gym:QuadXWaypointsCost-v1")
- state
The current system state.
- Type:
- initial_physics_time
The simulation startup time. The physics time at the start of the episode after all the initialisation has been done.
- Type:
Initialise a new QuadXWaypointsCost environment instance.
- Parameters:
num_targets (int, optional) – Number of waypoints in the environment. By default
4
.use_yaw_targets (bool, optional) – Whether to match yaw targets before a waypoint is considered reached. By default
False
.goal_reach_distance (float, optional) – Distance to the waypoints for it to be considered reached. By default
0.2
.goal_reach_angle (float, optional) – Angle in radians to the waypoints for it to be considered reached, only in effect if
use_yaw_targets
is used. By default0.1
.flight_dome_size (float, optional) – Size of the allowable flying area. By default
5.0
.angle_representation (str, optional) – The angle representation to use. Can be
"euler"
or"quaternion"
. By default"quaternion"
.agent_hz (int, optional) – Looprate of the agent to environment interaction. By default
30
.render_mode (None | str, optional) – The render mode. Can be
"human"
orNone
. By defaultNone
.render_resolution (tuple[int, int], optional) – The render resolution. By default
(480, 480)
.include_health_penalty (bool, optional) – Whether to penalize the quadrotor if it becomes unhealthy (i.e. if it falls over). Defaults to
True
.health_penalty_size (int, optional) – The size of the unhealthy penalty. Defaults to
None
. Meaning the penalty is equal to the max episode steps and the steps taken.exclude_waypoint_targets_from_observation (bool, optional) – Whether to exclude the waypoint targets from the observation. Defaults to
False
.only_observe_immediate_waypoint (bool, optional) – Whether to only observe the immediate waypoint target. Defaults to
True
.exclude_waypoint_target_deltas_from_observation (bool, optional) – Whether to exclude the waypoint target deltas from the observation. Defaults to
True
.only_observe_immediate_waypoint_target_delta (bool, optional) – Whether to only observe the immediate waypoint target delta. Defaults to
True
.action_space_dtype (union[numpy.dtype, str], optional) – The data type of the action space. Defaults to
np.float64
.observation_space_dtype (union[numpy.dtype, str], optional) – The data type of the observation space. Defaults to
np.float64
.**kwargs – Additional keyword arguments passed to the
QuadXWaypointsEnv
- state = None
- initial_physics_time = None
- _max_episode_steps_applied = False
- _previous_num_targets_reached = 0
- _episode_waypoint_targets = None
- _current_immediate_waypoint_target = None
- agent_hz
- _include_health_penalty
- _health_penalty_size
- _exclude_waypoint_targets_from_observation
- _only_observe_immediate_waypoint
- _exclude_waypoint_target_deltas_from_observation
- _only_observe_immediate_waypoint_target_delta
- _action_space_dtype
- _observation_space_dtype
- _action_dtype_conversion_warning = False
- low
- high
- observation_space
ENVIRONMENT CONSTANTS
- compute_target_deltas(ang_pos, lin_pos, quarternion)[source]
Compute the waypoints target deltas.
Note
Needed because the ~PyFlyt.gym_envs.quadx_envs.quadx_waypoints_env.QuadXWaypointsEnv removes the immediate waypoint from the waypoint targets list when it is reached and doesn’t expose the old value.
- Parameters:
ang_pos (np.ndarray) – The current angular position.
lin_pos (np.ndarray) – The current position.
quarternion (np.ndarray) – The current quarternion.
- Returns:
The waypoints target deltas.
- Return type:
(np.ndarray)
- step(action)[source]
Take step into the environment.
Note
This method overrides the
step()
method such that the new cost function is used.- Parameters:
action (np.ndarray) – Action to take in the environment.
- Returns:
tuple containing:
obs (
np.ndarray
): Environment observation.cost (
float
): Cost of the action.terminated (
bool
): Whether the episode is terminated.truncated (
bool
): Whether the episode was truncated. This value is set by wrappers when for example a time limit is reached or the agent goes out of bounds.info (
dict
): Additional information about the environment.
- Return type:
(tuple)
- reset(seed=None, options=None)[source]
Reset gymnasium environment.
- Parameters:
- Returns:
tuple containing:
obs (
numpy.ndarray
): Initial environment observation.info (
dict
): Dictionary containing additional information.
- Return type:
(tuple)
- property immediate_waypoint_target
- The immediate waypoint target.
- property time_limit_max_episode_steps
- The maximum number of steps that the environment can take before it is
- truncated by the :class:`gymnasium.wrappers.TimeLimit` wrapper.
- property time_limit
- The maximum duration of the episode in seconds.
- property dt
- The environment step size.
- Returns:
- The simulation step size. Returns
None
if the environment is not yet initialized.
- The simulation step size. Returns
- Return type:
(float)
- property tau
- Alias for the environment step size. Done for compatibility with the
- other gymnasium environments.
- Returns:
- The simulation step size. Returns
None
if the environment is not yet initialized.
- The simulation step size. Returns
- Return type:
(float)
- property t
- Environment time.
- property physics_time
- Returns the physics time.