Policy testers

Policy eval utility

SLC ships with an evaluation utility that can be used to check a trained policy’s performance. In cases where the environment is successfully saved alongside the agent, it’s a cinch to watch the trained agent act in the environment using:

python -m stable_learning_control.run test_policy [path/to/output_directory] [-h]
    [--len LEN] [--episodes EPISODES] [--norender] [--itr ITR] [--deterministic]

Positional Arguments:

output_dir: str. The path to the output directory where the agent and environment were saved.

Optional Arguments:

-l L, --len=L, default=0: int. Maximum length of test episode/trajectory/rollout. The default of 0 means no maximum episode length (i.e. episodes only end when the agent has reached a terminal state in the environment).

-n N, --episodes=N, default=100: int. Number of test episodes to run the agent for.

-nr, --norender, default=False: bool. Do not render the test episodes to the screen. In this case, test_policy will only print the episode returns and lengths. (Use case: the renderer slows down the testing process, and you want to get a fast sense of how the agent is performing, so you don’t particularly care to watch it.)

-i I, --itr=I, default=-1: int. Specify the snapshot (checkpoint) for which you want to see the policy performance. Use case: Sometimes, it’s nice to watch trained agents from many different training points (eg watch at iteration 50, 100, 150, etc.). The default value of this flag means “use the latest snapshot.”

Important

This option only works if snapshots were saved while training the agent (i.e. the --save_checkpoints flag was set). For more information on storing these snapshots see Algorithm Flags.

-d, --deterministic, default=True: bool. Another special case, which is only used for the SAC and LAC algorithms. The SLC implementation trains a stochastic policy, but is evaluated using the deterministic mean of the action distribution. test_policy will default to using the stochastic policy trained by SAC, but you should set the deterministic flag to watch the deterministic mean policy (the correct evaluation policy for SAC). This flag is not used for any other algorithms.

Robustness eval utility

SLC ships with an evaluation utility that can be used to check the robustness of the trained policy. In cases where the environment is successfully saved alongside the agent, the robustness can be evaluated using the following command:

python -m stable_learning_control.run eval_robustness [path/to/output_directory] [disturber] [-h] [--list_disturbers]
    [--disturber_config DISTURBER_CONFIG] [--data_dir DATA_DIR] [--itr ITR]
    [--len LEN] [--episodes EPISODES] [--render] [--deterministic] [--disable_baseline]
    [--observations [OBSERVATIONS [OBSERVATIONS ...]]] [--references [REFERENCES [REFERENCES ...]]]
    [--reference_errors [REFERENCE_ERRORS [REFERENCE_ERRORS ...]]] [--absolute_reference_errors]
    [--merge_reference_errors] [--use_subplots] [--use_time] [--save_result] [--save_plots]
    [--figs_fmt FIGS_FMT] [--font_scale FONT_SCALE]

Positional Arguments:

output_dir: str. The path to the output directory where the agent and environment were saved.

disturber: str. The name of the disturber you want to evaluate. Can include an unloaded module in ‘module:disturber_name’ style.

Optional Arguments:

--list, --list_disturbers, default=False: bool. Lists the available disturbers found in the SLC package.

--cfg, --disturber_config DISTURBER_CONFIG, default=None: str. The configuration you want to pass to the disturber. It sets up the range of disturbances you wish to evaluate. Expects a dictionary that depends on the specified disturber (e.g. "{'mean': [0.25, 0.25], 'std': [0.05, 0.05]}" for ObservationRandomNoiseDisturber disturber).

--data_dir: str. The folder to which you want to store the robustness eval results, meaning the data frame and the plots.

-i I, --itr=I, default=-1: int. Specify the snapshot (checkpoint) for which you want to see the policy performance. Use case: Sometimes, it’s nice to evaluate the robustness of the agent from many different points in training (e.g. at iteration 50, 100, 150, etc.). The default value of -1 means “use the latest snapshot.”

Important

This option only works if snapshots were saved while training the agent (i.e. the --save_checkpoints flag was set). For more information on storing these snapshots, see Algorithm Flags.

-l L, --len=L, default=None: int. Maximum length of evaluation episode / trajectory / rollout. The default of None means no maximum episode length—episodes only end when the agent has reached a terminal state in the environment.

-n N, --episodes=N, default=100: int. Number of evaluation episodes to run for each disturbance variant.

-r, --render, default=False: bool. Do also render the evaluation episodes to the screen.

-d, --deterministic, default=False: bool. Another special case, which is only used for the SAC and LAC algorithms. The SLC implementation trains a stochastic policy, but is evaluated using the deterministic mean of the action distribution. test_policy will default to using the stochastic policy trained by SAC, but you should set the deterministic flag to watch the deterministic mean policy (the correct evaluation policy for SAC). This flag is not used for any other algorithms.

--disable_baseline, default=False: bool. Disable the baseline evaluation. The baseline evaluation is a special case where the agent is evaluated without any disturbance applied. This is useful for comparing the performance of the agent with and without the disturbance.

--obs, --observations, default=None: :obj:`list of ints`. The observations you want to show in the observations/reference plots. The default value of None means all observations will be shown.

--refs, --references, default=None: :obj:`list of ints`. The references you want to show in the observations/reference plots. The default value of None means all references will be shown.

--ref_errs, --reference_errors, default=None: :obj:`list of ints`. The reference errors you want to show in the reference error plots. The default value of None means all reference errors will be shown.

--abs_ref_errs, --absolute_reference_errors, default=False: bool. Whether you want to show the absolute reference errors in the reference error plots. The default value of False means the relative reference errors will be shown.

--merge_ref_errs, --merge_reference_errors, default=False: bool. Whether you want to merge the reference errors into one reference error. The default value of False means the reference errors will be shown separately.

--use_subplots, default=False: bool. Whether you want to use subplots for the plots. The default value of False means the plots will be shown separately.

--use_time, default=False: bool. Whether you want to use time as the x-axis for the plots. The default value of False means the x-axis will show the steps.

--save_result, default=False: bool. Whether you want to save the robustness evaluation data frame to disk. It can be useful for creating custom plots see Create custom plots.

--save_plots, default=False: bool. Specifies whether you want to save the generated plots to disk.

--figs_fmt, default=pdf: bool. The file format you want to use for saving the plot.

--font_scale, default=1.5: float. The font scale you want to use for the plot text.

--use_wandb, default=False: bool. Whether you want log the results to Weights & Biases.

--wandb_job_type, default=eval: str. The job type you want to use for the Weights & Biases logging.

--wandb_project, default=stable-learning-control: str. The name of the Weights & Biases project you want to log to.

--wandb_group, default=None: str. The name of the Weights & Biases group you want to log to.

--wandb_run_name, default=None: str. The name of the Weights & Biases run you want to log to. If not specified, the run name will be automatically generated based on the policy directory and disturber.