ros_gazebo_gym.robot_gazebo_goal_env
The Gazebo GOAL environment is mainly used to connect the simulated Gym GOAL environment to the Gazebo simulator. It takes care of the resets of the simulator after each step or the resets of the controllers (if needed), it also takes care of all the steps that need to be done on the simulator when doing a training step or a training reset (typical steps in the reinforcement learning loop).
Important
This class is similar to the
RobotGazeboEnv
class but now instead of
the gym.Env
class the gym.GoalEnv
class is used as the superclass.
As a result, a goal-based environment is created. You should use this goal based
Gazebo environment when you are working with RL algorithms that require a spare
reward space (i.e. Hindsight Experience Replay (HER)).
This goal-based environment just like any regular gymnasium environment, but it
imposes a required structure on the observation_space. More concretely, the
observation space is required to contain at least three elements, namely
observation
, desired_goal
, and achieved_goal
. Here, desired_goal
specifies the goal that the agent should attempt to achieve. achieved_goal is the
goal that it currently achieved instead. observation contains the actual
observations of the environment as per usual.
Further with this goal env, not the cumulative cost is published to the
/ros_gazebo_gym/reward
reward topic during the env reset, like was the case in
the RobotGazeboEnv
, but the cost at each
step.
Module Contents
Classes
Connects the simulated GOAL gymnasium environment to the gazebo simulator. |
- class ros_gazebo_gym.robot_gazebo_goal_env.RobotGazeboGoalEnv(robot_name_space, reset_controls, controllers_list=None, reset_robot_pose=False, reset_world_or_sim='SIMULATION', log_reset=True, pause_simulation=False, publish_rviz_training_info_overlay=False)[source]
Bases:
gymnasium_robotics.GoalEnv
Connects the simulated GOAL gymnasium environment to the gazebo simulator.
- gazebo
Gazebo connector which can be used to interact with the gazebo simulation.
- Type:
Initiate the RobotGazebo environment instance.
- Parameters:
robot_name_space (str) – The namespace the robot is on.
reset_controls (bool) – Whether the controllers should be reset when the
RobotGazeboEnv.reset()
method is called.controllers_list (list, optional) – A list with currently available controllers to look for. Defaults to
None
, which means that the class will try to retrieve all the running controllers.reset_robot_pose (bool) – Boolean specifying whether to reset the robot pose when the simulation is reset.
reset_world_or_sim (str, optional) – Whether you want to reset the whole simulation “SIMULATION” at startup or only the world “WORLD” (object positions). Defaults to “SIMULATION”.
log_reset (bool, optional) – Whether we want to print a log statement when the world/simulation is reset. Defaults to
True
.pause_sim (bool, optional) – Whether the simulation should be paused after each step (i.e. after each action). Defaults to
False
.publish_rviz_training_info_overlay (bool, optional) – Whether a RViz overlay should be published with the training results. Defaults to
False
.
- step(action)[source]
Function executed each time step. Here we get the action execute it in a time step and retrieve the observations generated by that action. We also publish the step reward on the
/ros_gazebo_gym/reward
topic.- Parameters:
action (numpy.ndarray) – The action we want to perform in the environment.
- Returns:
tuple containing:
obs (
np.ndarray
): Environment observation.cost (
float
): Cost of the action.terminated (
bool
): Whether the episode is terminated.truncated (
bool
): Whether the episode was truncated. This value is set by wrappers when for example a time limit is reached or the agent goes out of bounds.info (
dict
): Additional information about the environment.
- Return type:
(tuple)
Note
Here we should convert the action num to movement action, execute the action in the simulation and get the observations result of performing that action.
- reset(seed=None, options=None)[source]
Function executed when resetting the environment.
- Parameters:
- Returns:
tuple containing:
obs (
numpy.ndarray
): The current stateinfo_dict (
dict
): Dictionary with additional information.
- Return type:
(tuple)
- close()[source]
Function executed when closing the environment. Use it for closing GUIS and other systems that need closing.