stable_learning_control.utils.test_policy

A set of functions that can be used to see a algorithm perform in the environment it was trained on.

Module Contents

Functions

_retrieve_iter_folder(fpath, itr)

Retrieves the path of the requested model iteration.

_retrieve_model_folder(fpath)

Tries to retrieve the model folder and backend from the given path.

load_policy_and_env(fpath[, itr])

Load a policy from save, whether it's TF or PyTorch, along with RL env.

load_tf_policy(fpath, env[, itr])

Load a TensorFlow policy saved with Stable learning control Logger.

load_pytorch_policy(fpath, env[, itr])

Load a pytorch policy saved with Stable Learning Control Logger.

run_policy(env, policy[, max_ep_len, num_episodes, ...])

Evaluates a policy inside a given gymnasium environment.

Attributes

parser

stable_learning_control.utils.test_policy._retrieve_iter_folder(fpath, itr)[source]

Retrieves the path of the requested model iteration.

Parameters:
  • fpath (str) – The path where the model is found.

  • itr (int) – The current policy iteration (checkpoint).

Raises:
Returns:

The model iteration path.

Return type:

str

stable_learning_control.utils.test_policy._retrieve_model_folder(fpath)[source]

Tries to retrieve the model folder and backend from the given path.

Parameters:

fpath (str) – The path where the model is found.

Raises:
Returns:

tuple containing:

  • model_folder (func): The model folder.

  • backend (str): The inferred backend. Options are tf2 and

    torch.

Return type:

(tuple)

stable_learning_control.utils.test_policy.load_policy_and_env(fpath, itr='last')[source]

Load a policy from save, whether it’s TF or PyTorch, along with RL env.

Parameters:
  • fpath (str) – The path where the model is found.

  • itr (str, optional) – The current policy iteration (checkpoint). Defaults to last.

  • deterministic (bool, optional) – Whether you want the action from the policy to be deterministic. Defaults to False.

Raises:
  • FileNotFoundError – Thrown when the fpath does not exist.

  • EnvLoadError – Thrown when something went wrong trying to load the saved environment.

  • PolicyLoadError – Thrown when something went wrong trying to load the saved policy.

Returns:

tuple containing:

  • env (gym.env): The gymnasium environment.

  • get_action (func): The policy get_action function.

Return type:

(tuple)

stable_learning_control.utils.test_policy.load_tf_policy(fpath, env, itr='last')[source]

Load a TensorFlow policy saved with Stable learning control Logger.

Parameters:
  • fpath (str) – The path where the model is found.

  • env (gym.env) – The gymnasium environment in which you want to test the policy.

  • itr (str, optional) – The current policy iteration. Defaults to “last”.

Returns:

The policy.

Return type:

tf.keras.Model

stable_learning_control.utils.test_policy.load_pytorch_policy(fpath, env, itr='last')[source]

Load a pytorch policy saved with Stable Learning Control Logger.

Parameters:
  • fpath (str) – The path where the model is found.

  • env (gym.env) – The gymnasium environment in which you want to test the policy.

  • itr (str, optional) – The current policy iteration. Defaults to “last”.

Returns:

The policy.

Return type:

torch.nn.Module

stable_learning_control.utils.test_policy.run_policy(env, policy, max_ep_len=None, num_episodes=100, render=True, deterministic=True)[source]

Evaluates a policy inside a given gymnasium environment.

Parameters:
  • env (gym.env) – The gymnasium environment.

  • policy (Union[tf.keras.Model, torch.nn.Module]) – The policy.

  • max_ep_len (int, optional) – The maximum episode length. Defaults to None.

  • num_episodes (int, optional) – Number of episodes you want to perform in the environment. Defaults to 100.

  • deterministic (bool, optional) – Whether you want the action from the policy to be deterministic. Defaults to True.

  • render (bool, optional) – Whether you want to render the episode to the screen. Defaults to True.

stable_learning_control.utils.test_policy.parser[source]