Experiment Outputs

In this section, we’ll cover

  • what outputs come from SLC algorithm implementations,

  • what formats they’re stored in and how they’re organised,

  • where they are stored and how you can change that,

  • and how to load and run trained policies.

Algorithm Outputs

Each algorithm is set up to save a training run’s hyperparameter configuration, learning progress, trained agent and value functions, and a copy of the environment if possible (to make it easy to load up the agent and environment simultaneously). The output directory contains the following:

Output Directory Structure

torch_save/

PyTorch implementations only. A directory containing
everything needed to restore the trained agent and value

tf2_save/

TensorFlow implementations only. A directory containing
everything needed to restore the trained agent and value

config.json

A dict containing an as-complete-as-possible
description of the args and kwargs you used to launch the
training function. If you passed in something which can’t
be serialised to JSON, it should get handled gracefully by
the logger, and the config file will represent it with a
string. Note: this is meant for record-keeping only.
Launching an experiment from a config file is not currently
supported.

progress.(csv/txt)

A (comma/tab) separated value file containing records of the
metrics recorded by the logger throughout training. eg,
Epoch, AverageEpRet, etc.

vars.pkl

A pickle file containing anything about the algorithm state
which should get stored. Currently, all algorithms only use
this to save a copy of the environment.

You Should Know

Sometimes environment-saving fails because the environment can’t be pickled, and vars.pkl is empty. This is known to be a problem for gymnasium Box2D environments in older versions of gymnasium, which can’t be saved in this manner.

You Should Know

The only file in here that you should ever have to use “by hand” is the config.json file. Our agent testing utility will load things from the tf2_save/ or torch_save/ directory, and our plotter interprets the contents of progress.txt, which are the correct tools for interfacing with these outputs. But there is no tooling for config.json–it’s just there as a reference for the hyperparameters used when you ran the experiment.

PyTorch Save Directory Info

The torch_save directory contains:

Pyt_Save Directory Structure

checkpoints/

Folder that when the save_checkpoints cmd line argument
is set to True contains the state of both the env and
model at multiple checkpoints during training.

model_state.pt

This file contains the state_dict that contains the
saved model weights. These weights can be used to restore
the trained agent’s state on an initiated instance of the
respective Algorithm Class.

save_info.json

A file used by the SLC package to ease model
loading. This file is not meant for the user.

TensorFlow Save Directory Info

The tf2_save directory contains:

TF2_Save Directory Structure

checkpoints/

Folder that when the save_checkpoints cmd line argument is set
to True contains the state of both the env and model at
multiple checkpoints during training.

variables/

A directory containing outputs from the TensorFlow Saver. See the

checkpoint

A checkpoint summary file that stores information about the saved
checkpoints.

weights_checkpoint.*

Two checkpoint data files ending with the .data* and
.index file extensions. These are the actual files that are
used by the tf.train.Checkpoint method to restore the
model.

save_info.json

A file used by the SLC package to ease model loading this file is
not meant for the user.

saved_model.json

The full TensorFlow program saved in the SavedModel format.
this file can be used to deploy your model to hardware. See the

Save Directory Location

Experiment results will, by default, be saved in the same directory as the SLC package, in a folder called data:

stable_learning_control/
    data/
        ...
    docs/
        ...
    stable_learning_control/
        ...
    LICENSE
    setup.py

You can change the default results directory by modifying DEFAULT_DATA_DIR in stable_learning_control/user_config.py.

Loading and Running Trained Policies

If Environment Saves Successfully

SLC ships with an evaluation utility that can be used to check a trained policy’s performance. In cases where the environment is successfully saved alongside the agent, you can watch the trained agent act in the environment using:

python -m stable_learning_control.run test_policy [path/to/output_directory]

See also

For more information on using this utility, see the Policy eval utility documentation or the code the API reference.

Environment Not Found Error

If the environment wasn’t saved successfully, you could expect test_policy.py to crash with something that looks like

Traceback (most recent call last):
  File "stable_learning_control/utils/test_policy.py", line 153, in <module>
    run_policy(env, get_action, args.len, args.episodes, not(args.norender))
  File "stable_learning_control/utils/test_policy.py", line 114, in run_policy
    "and we can't run the agent in it. :( nn Check out the documentation " +
AssertionError: Environment not found!

It looks like the environment wasn't saved, and we can't run the agent in it. :(

Check out the documentation page on the Test Policy utility for how to handle this situation.

In this case, watching your agent perform is slightly more painful but possible if you can recreate your environment easily. You can try the code below in IPython or use the steps in the Load Pytorch Policy or Load TensorFlow Policy documentation below to load the policy in a Python script.

>>> import gymnasium as gym
>>> from stable_learning_control.utils.test_policy import load_pytorch_policy, run_policy
>>> import your_env
>>> env = gym.make('<YOUR_ENV_NAME>')
>>> policy = load_pytorch_policy("/path/to/output_directory", env=env)
>>> run_policy(env, policy)
Logging data to /tmp/experiments/1536150702/progress.txt
Episode 0    EpRet -163.830      EpLen 93
Episode 1    EpRet -346.164      EpLen 99
...

If you want to load a Tensorflow agent, please replace the load_pytorch_policy() with load_tf_policy(). An example script for manually loading policies can be found in the examples folder (i.e. manual_env_policy_inference.py).

Load Pytorch Policy

Pytorch Policies can be loaded using the torch.load method. For more information on how to load PyTorch models, see the PyTorch documentation.

 1import torch
 2import os.path as osp
 3
 4from stable_learning_control.utils.log_utils.logx import EpochLogger
 5
 6from stable_learning_control.algos.pytorch import LAC
 7
 8MODEL_LOAD_FOLDER = "./data/lac/oscillator-v1/runs/run_1614680001"
 9MODEL_PATH = osp.join(MODEL_LOAD_FOLDER, "torch_save/model_state.pt")
10
11# Restore the model.
12config = EpochLogger.load_config(
13    MODEL_LOAD_FOLDER
14)  # Retrieve the experiment configuration.
15env = EpochLogger.load_env(MODEL_LOAD_FOLDER)
16model = LAC(env=env, ac_kwargs=config["ac_kwargs"])
17restored_model_state_dict = torch.load(MODEL_PATH, map_location="cpu")
18model.load_state_dict(
19    restored_model_state_dict,
20)
21
22# Create dummy observations and retrieve the best action.
23obs = torch.rand(env.observation_space.shape)
24a = model.get_action(obs)
25L_value = model.ac.L(obs, torch.from_numpy(a))
26
27# Print results.
28print(f"The LAC agent thinks it is a good idea to take action {a}.")
29print(f"It assigns a Lyapunov Value of {L_value} to this action.")

In this example, observe that

  • On line 6, we import the algorithm we want to load.

  • On line 12-14, we use the load_config() method to restore the hyperparameters that were used during the experiment. This saves us time in setting up the correct hyperparameters.

  • on line 15, we use the load_env() method to restore the environment used during the experiment. This saves us time in setting up the environment.

  • on line 17, we import the model weights.

  • on line 18-19, we load the saved weights onto the algorithm.

Additionally, each algorithm also contains a restore method, which serves as a wrapper around the torch.load and torch.nn.Module.load_state_dict methods.

Load TensorFlow Policy

 1import tensorflow as tf
 2import os.path as osp
 3
 4from stable_learning_control.utils.log_utils.logx import EpochLogger
 5
 6from stable_learning_control.algos.tf2 import LAC
 7
 8MODEL_LOAD_FOLDER = "./data/lac/oscillator-v1/runs/run_1614673367"
 9MODEL_PATH = osp.join(MODEL_LOAD_FOLDER, "tf2_save")
10
11# Restore the model.
12config = EpochLogger.load_config(
13    MODEL_LOAD_FOLDER
14)  # Retrieve the experiment configuration.
15env = EpochLogger.load_env(MODEL_LOAD_FOLDER)
16model = LAC(env=env, ac_kwargs=config["ac_kwargs"])
17weights_checkpoint = tf.train.latest_checkpoint(MODEL_PATH)
18model.load_weights(
19    weights_checkpoint,
20)
21
22# Create dummy observations and retrieve the best action.
23obs = tf.random.uniform((1, env.observation_space.shape[0]))
24a = model.get_action(obs)
25L_value = model.ac.L([obs, tf.expand_dims(a, axis=0)])
26
27# Print results.
28print(f"The LAC agent thinks it is a good idea to take action {a}.")
29print(f"It assigns a Lyapunov Value of {L_value} to this action.")

In this example, observe that

  • On line 6, we import the algorithm we want to load.

  • On line 12-14, we use the load_config() method to restore the hyperparameters that were used during the experiment. This saves us time in setting up the correct hyperparameters.

  • on line 15, we use the load_env() method to restore the environment used during the experiment. This saves us time in setting up the environment.

  • on line 17, we import the model weights.

  • on line 18-19, we load the saved weights onto the algorithm.

Additionally, each algorithm also contains a restore method which serves as a wrapper around the tf.train.latest_checkpoint and tf.keras.Model.load_weights methods.

Using Trained Value Functions

The test_policy.py tool doesn’t help you look at trained value functions; if you want to use those, you must load the policy manually. Please see the Environment Not Found Error documentation for an example of how to do this.

Deploy the saved result onto hardware

Deploy PyTorch Algorithms

Attention

PyTorch provides multiple ways to deploy trained models to hardware (see the PyTorch serving documentation). Unfortunately, at the time of writing, these methods currently do not support the agents used in the SLC package. For more information, see this issue.

Deploy TensorFlow Algorithms

As stated above, the TensorFlow version of the algorithm also saves the entire model in the SavedModel format this format is handy for sharing or deploying with TFLite, TensorFlow.js, TensorFlow Serving, or TensorFlow Hub. If you want to deploy your trained model onto hardware, you first have to make sure you set the --export cmd-line argument to True when training the algorithm. This will cause the complete TensorFlow program, including trained parameters (i.e, tf.Variables) and computation, to be saved in the tf2_save/saved_model.pb file. This SavedModel can be loaded onto the hardware using the tf.saved_model.load method.

import os
import tensorflow as tf
from stable_learning_control.utils.log_utils.logx import EpochLogger

model_path = "./data/lac/oscillator-v1/runs/run_1614673367/tf2_save"

# Load model and environment.
loaded_model = tf.saved_model.load(model_path)
loaded_env = EpochLogger.load_env(os.path.dirname(model_path))

# Get action for dummy observation.
obs = tf.random.uniform((1, loaded_env.observation_space.shape[0]))
a = loaded_model.get_action(obs)
print(f"\nThe model thinks it is a good idea to take action: {a.numpy()}")

For more information on deploying TensorFlow models, see the TensorFlow documentation.