Experiment Outputs
In this section, we’ll cover
what outputs come from SLC algorithm implementations,
what formats they’re stored in and how they’re organised,
where they are stored and how you can change that,
and how to load and run trained policies.
Algorithm Outputs
Each algorithm is set up to save a training run’s hyperparameter configuration, learning progress, trained agent and value functions, and a copy of the environment if possible (to make it easy to load up the agent and environment simultaneously). The output directory contains the following:
Output Directory Structure |
|
|
PyTorch implementations only. A directory containing
everything needed to restore the trained agent and value
functions (Details for PyTorch saves below).
|
|
TensorFlow implementations only. A directory containing
everything needed to restore the trained agent and value
functions (Details for TensorFlow saves below).
|
|
A
dict containing an as-complete-as-possibledescription of the args and kwargs you used to launch the
training function. If you passed in something which can’t
be serialised to JSON, it should get handled gracefully by
the logger, and the config file will represent it with a
string. Note: this is meant for record-keeping only.
Launching an experiment from a config file is not currently
supported.
|
|
A (comma/tab) separated value file containing records of the
metrics recorded by the logger throughout training. eg,
Epoch , AverageEpRet , etc. |
|
A pickle file containing anything about the algorithm state
which should get stored. Currently, all algorithms only use
this to save a copy of the environment.
|
You Should Know
Sometimes environment-saving fails because the environment can’t be pickled, and vars.pkl
is empty. This is known
to be a problem for gymnasium Box2D environments in older versions of gymnasium, which can’t be saved in this manner.
You Should Know
The only file in here that you should ever have to use “by hand” is the config.json
file. Our agent testing utility
will load things from the tf2_save/
or torch_save/
directory, and our plotter interprets the contents of progress.txt
,
which are the correct tools for interfacing with these outputs. But there is no tooling for config.json
–it’s just
there as a reference for the hyperparameters used when you ran the experiment.
PyTorch Save Directory Info
The torch_save
directory contains:
Pyt_Save Directory Structure |
|
|
Folder that when the
save_checkpoints cmd line argumentis set to
True contains the state of both the env andmodel at multiple
checkpoints during training. |
|
This file contains the
state_dict that contains thesaved model weights. These weights can be used to restore
the trained agent’s state on an initiated instance of the
respective Algorithm Class.
|
|
A file used by the SLC package to ease model
loading. This file is not meant for the user.
|
TensorFlow Save Directory Info
The tf2_save
directory contains:
TF2_Save Directory Structure |
|
|
Folder that when the
save_checkpoints cmd line argument is setto
True contains the state of both the env and model atmultiple
checkpoints during training. |
|
A directory containing outputs from the TensorFlow Saver. See the
TensorFlow save and load documentation for more info.
|
|
A checkpoint summary file that stores information about the saved
checkpoints.
|
|
Two checkpoint data files ending with the
.data* and.index file extensions. These are the actual files that areused by the
tf.train.Checkpoint method to restore themodel.
|
|
A file used by the SLC package to ease model loading this file is
not meant for the user.
|
|
The full TensorFlow program saved in the SavedModel format.
this file can be used to deploy your model to hardware. See the
hardware deployment documentation for more info.
|
Save Directory Location
Experiment results will, by default, be saved in the same directory as the SLC package,
in a folder called data
:
stable_learning_control/ data/ ... docs/ ... stable_learning_control/ ... LICENSE setup.py
You can change the default results directory by modifying DEFAULT_DATA_DIR
in stable_learning_control/user_config.py
.
Loading and Running Trained Policies
If Environment Saves Successfully
SLC ships with an evaluation utility that can be used to check a trained policy’s performance. In cases where the environment is successfully saved alongside the agent, you can watch the trained agent act in the environment using:
python -m stable_learning_control.run test_policy [path/to/output_directory]
See also
For more information on using this utility, see the Policy eval utility documentation or the code the API reference.
Environment Not Found Error
If the environment wasn’t saved successfully, you could expect test_policy.py
to crash with something that looks like
Traceback (most recent call last): File "stable_learning_control/utils/test_policy.py", line 153, in <module> run_policy(env, get_action, args.len, args.episodes, not(args.norender)) File "stable_learning_control/utils/test_policy.py", line 114, in run_policy "and we can't run the agent in it. :( nn Check out the documentation " + AssertionError: Environment not found! It looks like the environment wasn't saved, and we can't run the agent in it. :( Check out the documentation page on the Test Policy utility for how to handle this situation.
In this case, watching your agent perform is slightly more painful but possible if you can recreate your environment easily. You can try the code below in IPython or use the steps in the Load Pytorch Policy or Load TensorFlow Policy documentation below to load the policy in a Python script.
>>> import gymnasium as gym
>>> from stable_learning_control.utils.test_policy import load_pytorch_policy, run_policy
>>> import your_env
>>> env = gym.make('<YOUR_ENV_NAME>')
>>> policy = load_pytorch_policy("/path/to/output_directory", env=env)
>>> run_policy(env, policy)
Logging data to /tmp/experiments/1536150702/progress.txt
Episode 0 EpRet -163.830 EpLen 93
Episode 1 EpRet -346.164 EpLen 99
...
If you want to load a Tensorflow agent, please replace the load_pytorch_policy()
with
load_tf_policy()
. An example script for manually loading policies can be found in the
examples
folder (i.e. manual_env_policy_inference.py).
Load Pytorch Policy
Pytorch Policies can be loaded using the torch.load
method. For more information on how to load PyTorch models, see
the PyTorch documentation.
1import torch
2import os.path as osp
3
4from stable_learning_control.utils.log_utils.logx import EpochLogger
5
6from stable_learning_control.algos.pytorch import LAC
7
8MODEL_LOAD_FOLDER = "./data/lac/oscillator-v1/runs/run_1614680001"
9MODEL_PATH = osp.join(MODEL_LOAD_FOLDER, "torch_save/model_state.pt")
10
11# Restore the model.
12config = EpochLogger.load_config(
13 MODEL_LOAD_FOLDER
14) # Retrieve the experiment configuration.
15env = EpochLogger.load_env(MODEL_LOAD_FOLDER)
16model = LAC(env=env, ac_kwargs=config["ac_kwargs"])
17restored_model_state_dict = torch.load(MODEL_PATH, map_location="cpu")
18model.load_state_dict(
19 restored_model_state_dict,
20)
21
22# Create dummy observations and retrieve the best action.
23obs = torch.rand(env.observation_space.shape)
24a = model.get_action(obs)
25L_value = model.ac.L(obs, torch.from_numpy(a))
26
27# Print results.
28print(f"The LAC agent thinks it is a good idea to take action {a}.")
29print(f"It assigns a Lyapunov Value of {L_value} to this action.")
In this example, observe that
On line 6, we import the algorithm we want to load.
On line 12-14, we use the
load_config()
method to restore the hyperparameters that were used during the experiment. This saves us time in setting up the correct hyperparameters.on line 15, we use the
load_env()
method to restore the environment used during the experiment. This saves us time in setting up the environment.on line 17, we import the model weights.
on line 18-19, we load the saved weights onto the algorithm.
Additionally, each algorithm also contains a restore
method, which serves as a
wrapper around the torch.load
and torch.nn.Module.load_state_dict
methods.
Load TensorFlow Policy
1import tensorflow as tf
2import os.path as osp
3
4from stable_learning_control.utils.log_utils.logx import EpochLogger
5
6from stable_learning_control.algos.tf2 import LAC
7
8MODEL_LOAD_FOLDER = "./data/lac/oscillator-v1/runs/run_1614673367"
9MODEL_PATH = osp.join(MODEL_LOAD_FOLDER, "tf2_save")
10
11# Restore the model.
12config = EpochLogger.load_config(
13 MODEL_LOAD_FOLDER
14) # Retrieve the experiment configuration.
15env = EpochLogger.load_env(MODEL_LOAD_FOLDER)
16model = LAC(env=env, ac_kwargs=config["ac_kwargs"])
17weights_checkpoint = tf.train.latest_checkpoint(MODEL_PATH)
18model.load_weights(
19 weights_checkpoint,
20)
21
22# Create dummy observations and retrieve the best action.
23obs = tf.random.uniform((1, env.observation_space.shape[0]))
24a = model.get_action(obs)
25L_value = model.ac.L([obs, tf.expand_dims(a, axis=0)])
26
27# Print results.
28print(f"The LAC agent thinks it is a good idea to take action {a}.")
29print(f"It assigns a Lyapunov Value of {L_value} to this action.")
In this example, observe that
On line 6, we import the algorithm we want to load.
On line 12-14, we use the
load_config()
method to restore the hyperparameters that were used during the experiment. This saves us time in setting up the correct hyperparameters.on line 15, we use the
load_env()
method to restore the environment used during the experiment. This saves us time in setting up the environment.on line 17, we import the model weights.
on line 18-19, we load the saved weights onto the algorithm.
Additionally, each algorithm also contains a restore
method
which serves as a wrapper around the tf.train.latest_checkpoint
and tf.keras.Model.load_weights
methods.
Using Trained Value Functions
The test_policy.py
tool doesn’t help you look at trained value functions; if you want to use those, you must load the policy manually. Please see the Environment Not Found Error documentation for an example of how to do this.
Deploy the saved result onto hardware
Deploy PyTorch Algorithms
Attention
PyTorch provides multiple ways to deploy trained models to hardware (see the PyTorch serving documentation). Unfortunately, at the time of writing, these methods currently do not support the agents used in the SLC package. For more information, see this issue.
Deploy TensorFlow Algorithms
As stated above, the TensorFlow version of the algorithm also saves the entire model in the SavedModel format this format is handy for sharing or deploying
with TFLite, TensorFlow.js, TensorFlow Serving, or TensorFlow Hub. If you want to deploy your trained model onto hardware, you first have to make sure
you set the --export
cmd-line argument to True
when training the algorithm. This will cause the complete TensorFlow program, including trained parameters
(i.e, tf.Variables) and computation, to be saved in the tf2_save/saved_model.pb
file. This SavedModel can be loaded onto the hardware using
the tf.saved_model.load
method.
import os
import tensorflow as tf
from stable_learning_control.utils.log_utils.logx import EpochLogger
model_path = "./data/lac/oscillator-v1/runs/run_1614673367/tf2_save"
# Load model and environment.
loaded_model = tf.saved_model.load(model_path)
loaded_env = EpochLogger.load_env(os.path.dirname(model_path))
# Get action for dummy observation.
obs = tf.random.uniform((1, loaded_env.observation_space.shape[0]))
a = loaded_model.get_action(obs)
print(f"\nThe model thinks it is a good idea to take action: {a.numpy()}")
For more information on deploying TensorFlow models, see the TensorFlow documentation.