Hyperparameter Tuning

Hyperparameter tuning is crucial in RL as it directly impacts the agent’s performance and stability. Properly selected hyperparameters can lead to faster convergence, improved overall task performance and generalizability. Because of this, the SLC package provides several tools to help with hyperparameter tuning.

Use the ExperimentGrid utility

As outlined in the Running Experiments section, the SLC package includes a utility class called ExperimentGrid, which enables the execution of multiple experiments sequentially. You can utilize this utility in two ways: by supplying the CLI with more than one value for a specific argument (refer to Running Experiments), or by directly employing the ExperimentGrid class (see Launching Multiple Experiments at Once). These methods facilitate running numerous experiments with distinct hyperparameter combinations, enabling a hyperparameter grid search to identify the optimal parameter setting for your task. For instance, to execute the LAC algorithm on the CartPoleCost-v1 environment with various values for actor and critic learning rates using the CLI, employ the following command:

python -m stable_learning_control.run lac --env CartPoleCost-v1 --lr_a 0.001 0.01 0.1 --lr_c 0.001 0.01 0.1

Tip

You can enable logging of TensorBoard and Weights & Biases by adding the --use_tensorboard and --use_wandb flags to the above command. These tools will allow you to track the performance of your experiments and compare the results of different hyperparameter combinations. For more information on how to use these logging utilities, see Loggers.

Use the Ray tuning package

The SLC package can also be used with more advanced tuning libraries like Ray Tune, which uses cutting-edge optimization algorithms to find the best hyperparameters for your model faster. An example of how to use SLC with the Ray Tuning package can be found in stable_learning_control/examples/torch/sac_ray_hyper_parameter_tuning.py and stable_learning_control/examples/tf2/sac_ray_hyper_parameter_tuning.py. The requirements for this example can be installed using the following command:

pip install .[tuning]

Consider the example in stable_learning_control/examples/pytorch/sac_ray_hyper_parameter_tuning.py:

  1import os.path as osp
  2
  3import gymnasium as gym
  4from ray import air, tune
  5from ray.tune.schedulers import ASHAScheduler
  6from ray.tune.search.hyperopt import HyperOptSearch
  7
  8# Import the algorithm we want to tune.
  9from stable_learning_control.algos.pytorch.sac import sac
 10
 11# Script parameters.
 12USE_WANDB = False
 13
 14
 15def train_sac(config):
 16    """Wrapper function that unpacks the config provided by the RAY tuner and converts
 17    them into the format the learning algorithm expects.
 18
 19    Args:
 20        config (dict): The Ray tuning configuration dictionary.
 21    """
 22    # Unpack trainable arguments.
 23    env_name = config.pop("env_name")
 24
 25    # Run algorithm training.
 26    sac(
 27        lambda: gym.make(env_name),
 28        **config,
 29    )
 30
 31
 32if __name__ == "__main__":
 33    # NOTE: Uncomment if you want to debug the code.
 34    # import ray
 35    # ray.init(local_mode=True)
 36
 37    # Setup Weights & Biases logging.
 38    ray_callbacks = []
 39    if USE_WANDB:
 40        from ray.air.integrations.wandb import WandbLoggerCallback
 41
 42        ray_callbacks.append(
 43            WandbLoggerCallback(
 44                job_type="tune",
 45                project="stable-learning-control",
 46                name="lac_ray_hyper_parameter_tuning_example",
 47            )
 48        )
 49
 50    # Setup the logging dir.
 51    dirname = osp.dirname(__file__)
 52    log_path = osp.abspath(osp.join(dirname, "../../data/ray_results"))
 53
 54    # Setup hyperparameter search starting point.
 55    initial_params = [{"gamma": 0.995, "lr_a": 1e-4, "alpha": 0.99}]
 56
 57    # Setup the parameter space for you hyperparameter search.
 58    search_space = {
 59        "env_name": "stable_gym:Oscillator-v1",
 60        "opt_type": "minimize",
 61        "gamma": tune.uniform(0.9, 0.999),
 62        "lr_a": tune.loguniform(1e-6, 1e-3),
 63        "alpha": tune.uniform(0.9, 1.0),
 64        "epochs": 2,
 65    }
 66
 67    # Initialize the hyperparameter tuning job.
 68    # NOTE: Available algorithm metrics are found in the `progress.csv` by the SLC CLI.
 69    trainable_with_cpu_gpu = tune.with_resources(
 70        train_sac,
 71        {"cpu": 12, "gpu": 1},
 72    )
 73    tuner = tune.Tuner(
 74        trainable_with_cpu_gpu,
 75        param_space=search_space,
 76        tune_config=tune.TuneConfig(
 77            metric="AverageEpRet",
 78            mode="min",  # NOTE: Should be equal to the 'opt_type'
 79            search_alg=HyperOptSearch(
 80                points_to_evaluate=initial_params,
 81            ),
 82            scheduler=ASHAScheduler(
 83                time_attr="epoch",
 84                max_t=200,
 85                grace_period=10,
 86            ),
 87            num_samples=20,
 88        ),
 89        run_config=air.RunConfig(
 90            storage_path=log_path,
 91            name=f"tune_sac_{search_space['env_name'].replace(':', '_')}",
 92            callbacks=ray_callbacks,
 93        ),
 94    )
 95
 96    # Start the hyperparameter tuning job.
 97    results = tuner.fit()
 98
 99    # Print the best trail.
100    best_trial = results.get_best_trial(metric="AverageEpRet", mode="min", scope="all")
101    best_path = results.get_best_logdir(metric="AverageEpRet", mode="min", scope="all")
102    best_config = results.get_best_config(
103        metric="AverageEpRet", mode="min", scope="all"
104    )
105    best_result = results.fetch_trial_dataframes()[best_path]["AverageEpLen"].min()
106    print("The hyperparameter tuning job has finished.")
107    print(f"Best trail: {best_trial}")
108    print(f"Best result: {best_result}")
109    print(f"Path: {best_path}")
110    print(f"Best hyperparameters found were: {best_config}")

In this example, a boolean on line 12 can enable Weights & Biases logging. On lines 15-29, we first create a small wrapper function that ensures that the Ray Tuner serves the hyperparameters in the SLC algorithm’s format. Following lines 38-48 setup a Weights & Biases callback if the USE_WANDB constant is set to True. On line 55, we then set the starting point for several hyperparameters used in the hyperparameter search. Next, we define the hyperparameter search space on lines 58-65 while we initialise the Ray Tuner instance on lines 69-94. Lastly, we start the hyperparameter search by calling the Tuners fit method on line 97.

When running the script, the Ray tuner will search for the best hyperparameter combination. While doing so will print the results to the stdout, a TensorBoard logging file and the Weights & Biases portal. You can check the TensorBoard logs using the tensorboard --logdir ./data/ray_results command and the Weights & Biases results on the Weights & Biases website. For more information on how the Ray Tune tuning package works, see the Ray Tune documentation.

Note

An equivalent TensorFlow example is available in stable_learning_control/examples/tf2/sac_ray_hyper_parameter_tuning.py.