Hyperparameter Tuning

Hyperparameter tuning is crucial in RL as it directly impacts the agent’s performance and stability. Properly selected hyperparameters can lead to faster convergence, improved overall task performance and generalizability. Because of this, the SLC package provides several tools to help with hyperparameter tuning.

Use the ExperimentGrid utility

As outlined in the Running Experiments section, the SLC package includes a utility class called ExperimentGrid, which enables the execution of multiple experiments sequentially. You can utilize this utility in two ways: by supplying the CLI with more than one value for a specific argument (refer to Running Experiments), or by directly employing the ExperimentGrid class (see Launching Multiple Experiments at Once). These methods facilitate running numerous experiments with distinct hyperparameter combinations, enabling a hyperparameter grid search to identify the optimal parameter setting for your task. For instance, to execute the LAC algorithm on the CartPoleCost-v1 environment with various values for actor and critic learning rates using the CLI, employ the following command:

python -m stable_learning_control.run lac --env CartPoleCost-v1 --lr_a 0.001 0.01 0.1 --lr_c 0.001 0.01 0.1

Tip

You can enable logging of TensorBoard and Weights & Biases by adding the --use_tensorboard and --use_wandb flags to the above command. These tools will allow you to track the performance of your experiments and compare the results of different hyperparameter combinations. For more information on how to use these logging utilities, see Loggers.

Use the Ray tuning package

The SLC package can also be used with more advanced tuning libraries like Ray Tune, which uses cutting-edge optimization algorithms to find the best hyperparameters for your model faster. An example of how to use SLC with the Ray Tuning package can be found in stable_learning_control/examples/torch/sac_ray_hyper_parameter_tuning.py and stable_learning_control/examples/tf2/sac_ray_hyper_parameter_tuning.py. The requirements for this example can be installed using the following command:

pip install .[tuning]

Consider the example in stable_learning_control/examples/pytorch/sac_ray_hyper_parameter_tuning.py:

import os.path as osp

import gymnasium as gym
from ray import air, tune
from ray.tune.schedulers import ASHAScheduler
from ray.tune.search.hyperopt import HyperOptSearch

# Import the algorithm we want to tune.
from stable_learning_control.algos.pytorch.sac import sac

# Script parameters.
USE_WANDB = False


def train_sac(config):
    """Wrapper function that unpacks the config provided by the RAY tuner and converts
    them into the format the learning algorithm expects.

    Args:
        config (dict): The Ray tuning configuration dictionary.
    """
    # Unpack trainable arguments.
    env_name = config.pop("env_name")

    # Run algorithm training.
    sac(
        lambda: gym.make(env_name),
        **config,
    )


if __name__ == "__main__":
    # NOTE: Uncomment if you want to debug the code.
    # import ray
    # ray.init(local_mode=True)

    # Setup Weights & Biases logging.
    ray_callbacks = []
    if USE_WANDB:
        from ray.air.integrations.wandb import WandbLoggerCallback

        ray_callbacks.append(
            WandbLoggerCallback(
                job_type="tune",
                project="stable-learning-control",
                name="lac_ray_hyper_parameter_tuning_example",
            )
        )

    # Setup the logging dir.
    dirname = osp.dirname(__file__)
    log_path = osp.abspath(osp.join(dirname, "../../data/ray_results"))

    # Setup hyperparameter search starting point.
    initial_params = [{"gamma": 0.995, "lr_a": 1e-4, "alpha": 0.99}]

    # Setup the parameter space for you hyperparameter search.
    search_space = {
        "env_name": "stable_gym:Oscillator-v1",
        "opt_type": "minimize",
        "gamma": tune.uniform(0.9, 0.999),
        "lr_a": tune.loguniform(1e-6, 1e-3),
        "alpha": tune.uniform(0.9, 1.0),
        "epochs": 2,
    }

    # Initialize the hyperparameter tuning job.
    # NOTE: Available algorithm metrics are found in the `progress.csv` by the SLC CLI.
    trainable_with_cpu_gpu = tune.with_resources(
        train_sac,
        {"cpu": 12, "gpu": 1},
    )
    tuner = tune.Tuner(
        trainable_with_cpu_gpu,
        param_space=search_space,
        tune_config=tune.TuneConfig(
            metric="AverageEpRet",
            mode="min",  # NOTE: Should be equal to the 'opt_type'
            search_alg=HyperOptSearch(
                points_to_evaluate=initial_params,
            ),
            scheduler=ASHAScheduler(
                time_attr="epoch",
                max_t=200,
                grace_period=10,
            ),
            num_samples=20,
        ),
        run_config=air.RunConfig(
            storage_path=log_path,
            name=f"tune_sac_{search_space['env_name'].replace(':', '_')}",
            callbacks=ray_callbacks,
        ),
    )

    # Start the hyperparameter tuning job.
    results = tuner.fit()

    # Print the best trail.
    best_trial = results.get_best_trial(metric="AverageEpRet", mode="min", scope="all")
    best_path = results.get_best_logdir(metric="AverageEpRet", mode="min", scope="all")
    best_config = results.get_best_config(
        metric="AverageEpRet", mode="min", scope="all"
    )
    best_result = results.fetch_trial_dataframes()[best_path]["AverageEpLen"].min()
    print("The hyperparameter tuning job has finished.")
    print(f"Best trail: {best_trial}")
    print(f"Best result: {best_result}")
    print(f"Path: {best_path}")
    print(f"Best hyperparameters found were: {best_config}")

In this example, a boolean on line 12 can enable Weights & Biases logging. On lines 15-29, we first create a small wrapper function that ensures that the Ray Tuner serves the hyperparameters in the SLC algorithm’s format. Following lines 38-48 setup a Weights & Biases callback if the USE_WANDB constant is set to True. On line 55, we then set the starting point for several hyperparameters used in the hyperparameter search. Next, we define the hyperparameter search space on lines 58-65 while we initialise the Ray Tuner instance on lines 69-94. Lastly, we start the hyperparameter search by calling the Tuners fit method on line 97.

When running the script, the Ray tuner will search for the best hyperparameter combination. While doing so will print the results to the stdout, a TensorBoard logging file and the Weights & Biases portal. You can check the TensorBoard logs using the tensorboard --logdir ./data/ray_results command and the Weights & Biases results on the Weights & Biases website. For more information on how the Ray Tune tuning package works, see the Ray Tune documentation.

Note

An equivalent TensorFlow example is available in stable_learning_control/examples/tf2/sac_ray_hyper_parameter_tuning.py.