Policy testers
Policy eval utility
SLC ships with an evaluation utility that can be used to check a trained policy’s performance. In cases where the environment is successfully saved alongside the agent, it’s a cinch to watch the trained agent act in the environment using:
python -m stable_learning_control.run test_policy [path/to/output_directory] [-h]
[--len LEN] [--episodes EPISODES] [--norender] [--itr ITR] [--deterministic]
Positional Arguments:
Optional Arguments:
- -l L, --len=L, default=0
int
. Maximum length of test episode/trajectory/rollout. The default of0
means no maximum episode length (i.e. episodes only end when the agent has reached a terminal state in the environment).
- -nr, --norender, default=False
bool
. Do not render the test episodes to the screen. In this case,test_policy
will only print the episode returns and lengths. (Use case: the renderer slows down the testing process, and you want to get a fast sense of how the agent is performing, so you don’t particularly care to watch it.)
- -i I, --itr=I, default=-1
int
. Specify the snapshot (checkpoint) for which you want to see the policy performance. Use case: Sometimes, it’s nice to watch trained agents from many different training points (eg watch at iteration 50, 100, 150, etc.). The default value of this flag means “use the latest snapshot.”Important
This option only works if snapshots were saved while training the agent (i.e. the
--save_checkpoints
flag was set). For more information on storing these snapshots see Algorithm Flags.
- -d, --deterministic, default=True
bool
. Another special case, which is only used for the SAC and LAC algorithms. The SLC implementation trains a stochastic policy, but is evaluated using the deterministic mean of the action distribution.test_policy
will default to using the stochastic policy trained by SAC, but you should set the deterministic flag to watch the deterministic mean policy (the correct evaluation policy for SAC). This flag is not used for any other algorithms.
See also
If you receive an “Environment not found” error, see Environment Not Found Error.
Robustness eval utility
SLC ships with an evaluation utility that can be used to check the robustness of the trained policy. In cases where the environment is successfully saved alongside the agent, the robustness can be evaluated using the following command:
python -m stable_learning_control.run eval_robustness [path/to/output_directory] [disturber] [-h] [--list_disturbers]
[--disturber_config DISTURBER_CONFIG] [--data_dir DATA_DIR] [--itr ITR]
[--len LEN] [--episodes EPISODES] [--render] [--deterministic] [--disable_baseline]
[--observations [OBSERVATIONS [OBSERVATIONS ...]]] [--references [REFERENCES [REFERENCES ...]]]
[--reference_errors [REFERENCE_ERRORS [REFERENCE_ERRORS ...]]] [--absolute_reference_errors]
[--merge_reference_errors] [--use_subplots] [--use_time] [--save_result] [--save_plots]
[--figs_fmt FIGS_FMT] [--font_scale FONT_SCALE]
Positional Arguments:
- disturber
str
. The name of the disturber you want to evaluate. Can include an unloaded module in ‘module:disturber_name’ style.
Optional Arguments:
- --list, --list_disturbers, default=False
bool
. Lists the available disturbers found in the SLC package.
- --cfg, --disturber_config DISTURBER_CONFIG, default=None
str
. The configuration you want to pass to the disturber. It sets up the range of disturbances you wish to evaluate. Expects a dictionary that depends on the specified disturber (e.g."{'mean': [0.25, 0.25], 'std': [0.05, 0.05]}"
forObservationRandomNoiseDisturber
disturber).
- --data_dir
str
. The folder to which you want to store the robustness eval results, meaning the data frame and the plots.
- -i I, --itr=I, default=-1
int
. Specify the snapshot (checkpoint) for which you want to see the policy performance. Use case: Sometimes, it’s nice to evaluate the robustness of the agent from many different points in training (e.g. at iteration 50, 100, 150, etc.). The default value of-1
means “use the latest snapshot.”Important
This option only works if snapshots were saved while training the agent (i.e. the
--save_checkpoints
flag was set). For more information on storing these snapshots, see Algorithm Flags.
- -l L, --len=L, default=None
int
. Maximum length of evaluation episode / trajectory / rollout. The default ofNone
means no maximum episode length—episodes only end when the agent has reached a terminal state in the environment.
- -n N, --episodes=N, default=100
int
. Number of evaluation episodes to run for each disturbance variant.
- -d, --deterministic, default=False
bool
. Another special case, which is only used for the SAC and LAC algorithms. The SLC implementation trains a stochastic policy, but is evaluated using the deterministic mean of the action distribution.test_policy
will default to using the stochastic policy trained by SAC, but you should set the deterministic flag to watch the deterministic mean policy (the correct evaluation policy for SAC). This flag is not used for any other algorithms.
- --disable_baseline, default=False
bool
. Disable the baseline evaluation. The baseline evaluation is a special case where the agent is evaluated without any disturbance applied. This is useful for comparing the performance of the agent with and without the disturbance.
- --obs, --observations, default=None
:obj:`list of ints`. The observations you want to show in the observations/reference plots. The default value of
None
means all observations will be shown.
- --refs, --references, default=None
:obj:`list of ints`. The references you want to show in the observations/reference plots. The default value of
None
means all references will be shown.
- --ref_errs, --reference_errors, default=None
:obj:`list of ints`. The reference errors you want to show in the reference error plots. The default value of
None
means all reference errors will be shown.
- --abs_ref_errs, --absolute_reference_errors, default=False
bool
. Whether you want to show the absolute reference errors in the reference error plots. The default value ofFalse
means the relative reference errors will be shown.
- --merge_ref_errs, --merge_reference_errors, default=False
bool
. Whether you want to merge the reference errors into one reference error. The default value ofFalse
means the reference errors will be shown separately.
- --use_subplots, default=False
bool
. Whether you want to use subplots for the plots. The default value ofFalse
means the plots will be shown separately.
- --use_time, default=False
bool
. Whether you want to use time as the x-axis for the plots. The default value ofFalse
means the x-axis will show the steps.
- --save_result, default=False
bool
. Whether you want to save the robustness evaluation data frame to disk. It can be useful for creating custom plots see Create custom plots.
- --wandb_project, default=stable-learning-control
str
. The name of the Weights & Biases project you want to log to.
- --wandb_run_name, default=None
str
. The name of the Weights & Biases run you want to log to. If not specified, the run name will be automatically generated based on the policy directory and disturber.
See also
If you receive an “Environment not found” error, see Environment Not Found Error.