Hyperparameter Tuning
Hyperparameter tuning is crucial in RL as it directly impacts the agent’s performance and stability. Properly selected hyperparameters can lead to faster convergence, improved overall task performance and generalizability. Because of this, the SLC package provides several tools to help with hyperparameter tuning.
Use the ExperimentGrid utility
As outlined in the Running Experiments section, the SLC package includes a utility
class called ExperimentGrid, which enables the execution of multiple experiments sequentially.
You can utilize this utility in two ways: by supplying the CLI with more than one value for a specific argument
(refer to Running Experiments), or by directly employing the
ExperimentGrid
class (see Launching Multiple Experiments at Once). These
methods facilitate running numerous experiments with distinct hyperparameter combinations, enabling a hyperparameter grid search
to identify the optimal parameter setting for your task. For instance, to execute the LAC algorithm on the CartPoleCost-v1
environment with various values for actor and critic learning rates using the CLI, employ the following command:
python -m stable_learning_control.run lac --env CartPoleCost-v1 --lr_a 0.001 0.01 0.1 --lr_c 0.001 0.01 0.1
Tip
You can enable logging of TensorBoard and Weights & Biases by adding the --use_tensorboard
and --use_wandb
flags to the
above command. These tools will allow you to track the performance of your experiments and compare the results of
different hyperparameter combinations. For more information on how to use these logging utilities, see Loggers.
Use the Ray tuning package
The SLC package can also be used with more advanced tuning libraries like Ray Tune, which uses cutting-edge optimization algorithms to
find the best hyperparameters for your model faster. An example of how to use SLC with the Ray Tuning package can be found in
stable_learning_control/examples/torch/sac_ray_hyper_parameter_tuning.py
and
stable_learning_control/examples/tf2/sac_ray_hyper_parameter_tuning.py
. The requirements for this example can be
installed using the following command:
pip install .[tuning]
Consider the example in stable_learning_control/examples/pytorch/sac_ray_hyper_parameter_tuning.py
:
1import os.path as osp
2
3import gymnasium as gym
4from ray import air, tune
5from ray.tune.schedulers import ASHAScheduler
6from ray.tune.search.hyperopt import HyperOptSearch
7
8# Import the algorithm we want to tune.
9from stable_learning_control.algos.pytorch.sac import sac
10
11# Script parameters.
12USE_WANDB = False
13
14
15def train_sac(config):
16 """Wrapper function that unpacks the config provided by the RAY tuner and converts
17 them into the format the learning algorithm expects.
18
19 Args:
20 config (dict): The Ray tuning configuration dictionary.
21 """
22 # Unpack trainable arguments.
23 env_name = config.pop("env_name")
24
25 # Run algorithm training.
26 sac(
27 lambda: gym.make(env_name),
28 **config,
29 )
30
31
32if __name__ == "__main__":
33 # NOTE: Uncomment if you want to debug the code.
34 # import ray
35 # ray.init(local_mode=True)
36
37 # Setup Weights & Biases logging.
38 ray_callbacks = []
39 if USE_WANDB:
40 from ray.air.integrations.wandb import WandbLoggerCallback
41
42 ray_callbacks.append(
43 WandbLoggerCallback(
44 job_type="tune",
45 project="stable-learning-control",
46 name="lac_ray_hyper_parameter_tuning_example",
47 )
48 )
49
50 # Setup the logging dir.
51 dirname = osp.dirname(__file__)
52 log_path = osp.abspath(osp.join(dirname, "../../data/ray_results"))
53
54 # Setup hyperparameter search starting point.
55 initial_params = [{"gamma": 0.995, "lr_a": 1e-4, "alpha": 0.99}]
56
57 # Setup the parameter space for you hyperparameter search.
58 search_space = {
59 "env_name": "stable_gym:Oscillator-v1",
60 "opt_type": "minimize",
61 "gamma": tune.uniform(0.9, 0.999),
62 "lr_a": tune.loguniform(1e-6, 1e-3),
63 "alpha": tune.uniform(0.9, 1.0),
64 "epochs": 2,
65 }
66
67 # Initialize the hyperparameter tuning job.
68 # NOTE: Available algorithm metrics are found in the `progress.csv` by the SLC CLI.
69 trainable_with_cpu_gpu = tune.with_resources(
70 train_sac,
71 {"cpu": 12, "gpu": 1},
72 )
73 tuner = tune.Tuner(
74 trainable_with_cpu_gpu,
75 param_space=search_space,
76 tune_config=tune.TuneConfig(
77 metric="AverageEpRet",
78 mode="min", # NOTE: Should be equal to the 'opt_type'
79 search_alg=HyperOptSearch(
80 points_to_evaluate=initial_params,
81 ),
82 scheduler=ASHAScheduler(
83 time_attr="epoch",
84 max_t=200,
85 grace_period=10,
86 ),
87 num_samples=20,
88 ),
89 run_config=air.RunConfig(
90 storage_path=log_path,
91 name=f"tune_sac_{search_space['env_name'].replace(':', '_')}",
92 callbacks=ray_callbacks,
93 ),
94 )
95
96 # Start the hyperparameter tuning job.
97 results = tuner.fit()
98
99 # Print the best trail.
100 best_trial = results.get_best_trial(metric="AverageEpRet", mode="min", scope="all")
101 best_path = results.get_best_logdir(metric="AverageEpRet", mode="min", scope="all")
102 best_config = results.get_best_config(
103 metric="AverageEpRet", mode="min", scope="all"
104 )
105 best_result = results.fetch_trial_dataframes()[best_path]["AverageEpLen"].min()
106 print("The hyperparameter tuning job has finished.")
107 print(f"Best trail: {best_trial}")
108 print(f"Best result: {best_result}")
109 print(f"Path: {best_path}")
110 print(f"Best hyperparameters found were: {best_config}")
In this example, a boolean on line 12
can enable Weights & Biases logging. On lines 15-29
, we first create a small wrapper
function that ensures that the Ray Tuner serves the hyperparameters in the SLC algorithm’s format. Following lines 38-48
setup
a Weights & Biases callback if the USE_WANDB
constant is set to True
. On line 55
, we then set the starting point for
several hyperparameters used in the hyperparameter search. Next, we define the hyperparameter search space on lines 58-65
while we initialise the Ray Tuner instance on lines 69-94
. Lastly, we start the hyperparameter search by calling the
Tuners fit
method on line 97
.
When running the script, the Ray tuner will search for the best hyperparameter combination. While doing so will print
the results to the stdout
, a TensorBoard logging file and the Weights & Biases portal. You can check the TensorBoard logs using the
tensorboard --logdir ./data/ray_results
command and the Weights & Biases results on the Weights & Biases website. For more information on how the Ray Tune tuning package works, see
the Ray Tune documentation.
Note
An equivalent TensorFlow example is available in stable_learning_control/examples/tf2/sac_ray_hyper_parameter_tuning.py
.