Experiment Outputs 

In this section, we’ll cover

what outputs come from SLC algorithm implementations,
what formats they’re stored in and how they’re organised,
where they are stored and how you can change that,
and how to load and run trained policies.

Algorithm Outputs 

Each algorithm is set up to save a training run’s hyperparameter configuration, learning progress, trained agent and value functions, and a copy of the environment if possible (to make it easy to load up the agent and environment simultaneously). The output directory contains the following:

Output Directory Structure
`torch_save/`	PyTorch implementations only. A directory containing everything needed to restore the trained agent and value functions (Details for PyTorch saves below).
`tf2_save/`	TensorFlow implementations only. A directory containing everything needed to restore the trained agent and value functions (Details for TensorFlow saves below).
`config.json`	A `dict` containing an as-complete-as-possible description of the args and kwargs you used to launch the training function. If you passed in something which can’t be serialised to JSON, it should get handled gracefully by the logger, and the config file will represent it with a string. Note: this is meant for record-keeping only. Launching an experiment from a config file is not currently supported.
`progress.(csv/txt)`	A (comma/tab) separated value file containing records of the metrics recorded by the logger throughout training. eg, `Epoch`, `AverageEpRet`, etc.
`vars.pkl`	A pickle file containing anything about the algorithm state which should get stored. Currently, all algorithms only use this to save a copy of the environment.

You Should Know

Sometimes environment-saving fails because the environment can’t be pickled, and vars.pkl is empty. This is known to be a problem for gymnasium Box2D environments in older versions of gymnasium, which can’t be saved in this manner.

You Should Know

The only file in here that you should ever have to use “by hand” is the config.json file. Our agent testing utility will load things from the tf2_save/ or torch_save/ directory, and our plotter interprets the contents of progress.txt, which are the correct tools for interfacing with these outputs. But there is no tooling for config.json–it’s just there as a reference for the hyperparameters used when you ran the experiment.

PyTorch Save Directory Info 

The torch_save directory contains:

Pyt_Save Directory Structure
`checkpoints/`	Folder that when the `save_checkpoints` cmd line argument is set to `True` contains the state of both the env and model at multiple `checkpoints` during training.
`model_state.pt`	This file contains the `state_dict` that contains the saved model weights. These weights can be used to restore the trained agent’s state on an initiated instance of the respective Algorithm Class.
`save_info.json`	A file used by the SLC package to ease model loading. This file is not meant for the user.

TensorFlow Save Directory Info 

The tf2_save directory contains:

TF2_Save Directory Structure
`checkpoints/`	Folder that when the `save_checkpoints` cmd line argument is set to `True` contains the state of both the env and model at multiple `checkpoints` during training.
`variables/`	A directory containing outputs from the TensorFlow Saver. See the TensorFlow save and load documentation for more info.
`checkpoint`	A checkpoint summary file that stores information about the saved checkpoints.
`weights_checkpoint.*`	Two checkpoint data files ending with the `.data*` and `.index` file extensions. These are the actual files that are used by the `tf.train.Checkpoint` method to restore the model.
`save_info.json`	A file used by the SLC package to ease model loading this file is not meant for the user.
`saved_model.json`	The full TensorFlow program saved in the SavedModel format. this file can be used to deploy your model to hardware. See the hardware deployment documentation for more info.

Save Directory Location 

Experiment results will, by default, be saved in the same directory as the SLC package, in a folder called data:

stable_learning_control/
    data/
        ...
    docs/
        ...
    stable_learning_control/
        ...
    LICENSE
    setup.py

You can change the default results directory by modifying DEFAULT_DATA_DIR in stable_learning_control/user_config.py.

Loading and Running Trained Policies 

If Environment Saves Successfully 

SLC ships with an evaluation utility that can be used to check a trained policy’s performance. In cases where the environment is successfully saved alongside the agent, you can watch the trained agent act in the environment using:

python -m stable_learning_control.run test_policy [path/to/output_directory]

Environment Not Found Error 

If the environment wasn’t saved successfully, you could expect test_policy.py to crash with something that looks like

Traceback (most recent call last):
  File "stable_learning_control/utils/test_policy.py", line 153, in <module>
    run_policy(env, get_action, args.len, args.episodes, not(args.norender))
  File "stable_learning_control/utils/test_policy.py", line 114, in run_policy
    "and we can't run the agent in it. :( nn Check out the documentation " +
AssertionError: Environment not found!

It looks like the environment wasn't saved, and we can't run the agent in it. :(

Check out the documentation page on the Test Policy utility for how to handle this situation.

In this case, watching your agent perform is slightly more painful but possible if you can recreate your environment easily. You can try the code below in IPython or use the steps in the Load Pytorch Policy or Load TensorFlow Policy documentation below to load the policy in a Python script.

>>> import gymnasium as gym
>>> from stable_learning_control.utils.test_policy import load_pytorch_policy, run_policy
>>> import your_env
>>> env = gym.make('<YOUR_ENV_NAME>')
>>> policy = load_pytorch_policy("/path/to/output_directory", env=env)
>>> run_policy(env, policy)
Logging data to /tmp/experiments/1536150702/progress.txt
Episode 0    EpRet -163.830      EpLen 93
Episode 1    EpRet -346.164      EpLen 99
...

If you want to load a Tensorflow agent, please replace the load_pytorch_policy() with load_tf_policy(). An example script for manually loading policies can be found in the examples folder (i.e. manual_env_policy_inference.py).

Load Pytorch Policy 

Pytorch Policies can be loaded using the torch.load method. For more information on how to load PyTorch models, see the PyTorch documentation.

import torch
import os.path as osp

from stable_learning_control.utils.log_utils.logx import EpochLogger

from stable_learning_control.algos.pytorch import LAC

MODEL_LOAD_FOLDER = "./data/lac/oscillator-v1/runs/run_1614680001"
MODEL_PATH = osp.join(MODEL_LOAD_FOLDER, "torch_save/model_state.pt")

# Restore the model.
config = EpochLogger.load_config(
    MODEL_LOAD_FOLDER
)  # Retrieve the experiment configuration.
env = EpochLogger.load_env(MODEL_LOAD_FOLDER)
model = LAC(env=env, ac_kwargs=config["ac_kwargs"])
restored_model_state_dict = torch.load(MODEL_PATH, map_location="cpu")
model.load_state_dict(
    restored_model_state_dict,
)

# Create dummy observations and retrieve the best action.
obs = torch.rand(env.observation_space.shape)
a = model.get_action(obs)
L_value = model.ac.L(obs, torch.from_numpy(a))

# Print results.
print(f"The LAC agent thinks it is a good idea to take action {a}.")
print(f"It assigns a Lyapunov Value of {L_value} to this action.")

In this example, observe that

On line 6, we import the algorithm we want to load.
On line 12-14, we use the load_config() method to restore the hyperparameters that were used during the experiment. This saves us time in setting up the correct hyperparameters.
on line 15, we use the load_env() method to restore the environment used during the experiment. This saves us time in setting up the environment.
on line 17, we import the model weights.
on line 18-19, we load the saved weights onto the algorithm.

Additionally, each algorithm also contains a restore method, which serves as a wrapper around the torch.load and torch.nn.Module.load_state_dict methods.

Load TensorFlow Policy 

import tensorflow as tf
import os.path as osp

from stable_learning_control.utils.log_utils.logx import EpochLogger

from stable_learning_control.algos.tf2 import LAC

MODEL_LOAD_FOLDER = "./data/lac/oscillator-v1/runs/run_1614673367"
MODEL_PATH = osp.join(MODEL_LOAD_FOLDER, "tf2_save")

# Restore the model.
config = EpochLogger.load_config(
    MODEL_LOAD_FOLDER
)  # Retrieve the experiment configuration.
env = EpochLogger.load_env(MODEL_LOAD_FOLDER)
model = LAC(env=env, ac_kwargs=config["ac_kwargs"])
weights_checkpoint = tf.train.latest_checkpoint(MODEL_PATH)
model.load_weights(
    weights_checkpoint,
)

# Create dummy observations and retrieve the best action.
obs = tf.random.uniform((1, env.observation_space.shape[0]))
a = model.get_action(obs)
L_value = model.ac.L([obs, tf.expand_dims(a, axis=0)])

# Print results.
print(f"The LAC agent thinks it is a good idea to take action {a}.")
print(f"It assigns a Lyapunov Value of {L_value} to this action.")

In this example, observe that

On line 6, we import the algorithm we want to load.
On line 12-14, we use the load_config() method to restore the hyperparameters that were used during the experiment. This saves us time in setting up the correct hyperparameters.
on line 15, we use the load_env() method to restore the environment used during the experiment. This saves us time in setting up the environment.
on line 17, we import the model weights.
on line 18-19, we load the saved weights onto the algorithm.

Additionally, each algorithm also contains a restore method which serves as a wrapper around the tf.train.latest_checkpoint and tf.keras.Model.load_weights methods.

Using Trained Value Functions 

The test_policy.py tool doesn’t help you look at trained value functions; if you want to use those, you must load the policy manually. Please see the Environment Not Found Error documentation for an example of how to do this.

Deploy the saved result onto hardware 

Deploy PyTorch Algorithms 

Attention

PyTorch provides multiple ways to deploy trained models to hardware (see the PyTorch serving documentation). Unfortunately, at the time of writing, these methods currently do not support the agents used in the SLC package. For more information, see this issue.

Deploy TensorFlow Algorithms 

As stated above, the TensorFlow version of the algorithm also saves the entire model in the SavedModel format this format is handy for sharing or deploying with TFLite, TensorFlow.js, TensorFlow Serving, or TensorFlow Hub. If you want to deploy your trained model onto hardware, you first have to make sure you set the --export cmd-line argument to True when training the algorithm. This will cause the complete TensorFlow program, including trained parameters (i.e, tf.Variables) and computation, to be saved in the tf2_save/saved_model.pb file. This SavedModel can be loaded onto the hardware using the tf.saved_model.load method.

import os
import tensorflow as tf
from stable_learning_control.utils.log_utils.logx import EpochLogger

model_path = "./data/lac/oscillator-v1/runs/run_1614673367/tf2_save"

# Load model and environment.
loaded_model = tf.saved_model.load(model_path)
loaded_env = EpochLogger.load_env(os.path.dirname(model_path))

# Get action for dummy observation.
obs = tf.random.uniform((1, loaded_env.observation_space.shape[0]))
a = loaded_model.get_action(obs)
print(f"\nThe model thinks it is a good idea to take action: {a.numpy()}")

For more information on deploying TensorFlow models, see the TensorFlow documentation.