Lyapunov Actor-Critic (LAC)

See also

This document assumes you are familiar with the Soft Actor-Critic (SAC) algorithm. It is not meant to be a comprehensive guide but mainly depicts the difference between the SAC and Lyapunov Actor-Critic (LAC) algorithms. For more information, readers are referred to the original papers of Haarnoja et al., 2019 (SAC) and Han et al., 2020 (LAC).

Important

The LAC algorithm only guarantees stability in mean cost when trained on environments with a positive definite cost function (i.e. environments in which the cost is minimized). The opt_type argument can be set to maximize when training in environments where the reward is maximized. However, because the Lyapunov’s stability conditions are not satisfied, the LAC algorithm no longer guarantees stability in mean cost.

Background

The Lyapunov Actor-Critic (LAC) algorithm can be seen as a direct successor of the SAC algorithm. Although the SAC algorithm achieved impressive performance in various robotic control tasks, it does not guarantee its actions are stable. From a control-theoretic perspective, stability is the most critical property for any control system since it is closely related to robotic systems’ safety, robustness, and reliability. Using Lyapunov’s method, the LAC algorithm solves the aforementioned issues by proposing a data-based stability theorem that guarantees the system stays stable in mean cost.

Lyapunov critic function

The concept of Lyapunov stability is a useful and general approach for studying robotics Systems stability. In Lyapunov’s (direct) method, a scalar “energy-like” function, called a Lyapunov function, is constructed to analyse a system’s stability. According to Lyapunov’s stability conditions a dynamic autonomous system

\dot{x} = X(x), \quad \textrm{where} \quad X(0) = 0

\textrm{with} \quad x^{*}(t) = 0, t \geq t_0;

is said to be asymptotically stable if such an “energy” function V(x) exist such that in some neighbourhood \mathcal{V}^{*} around an equilibrium point x = 0 (\left \| x < k \right \|)

  1. V(x) and its partial derivatives are continuous.

  2. V(x) is positive definite

  3. \dot{V}(x) is negative semi-definite.

In classical control theory, this concept is often used to design controllers that ensure that the difference of a Lyapunov function along the state trajectory is always negative definite. This results in stable closed-loop system dynamics as the state is guaranteed to decrease the Lyapunov function’s value and eventually converge to the equilibrium. The biggest challenge with this approach is that finding such a function is difficult and quickly becomes impractical. In learning-based methods, for example, since we do not have complete information about the system, finding such a Lyapunov Function would result in trying out all possible consecutive data pairs in the state space, i.e., to verify infinite inequalities L_{t+1}-L_{t} < 0. The LAC algorithm solves this by taking a data-based approach in which the controller/policy and a Lyapunov critic function, both parameterised by deep neural networks, are jointly learned. In this way, the actor learns to control the system while only choosing actions guaranteed to be stable in mean cost. This inherent stability makes the LAC algorithm incredibly useful for stabilising and tracking robotic systems tasks.

Differences with the SAC algorithm

Like its predecessor, the LAC algorithm also uses entropy regularisation to increase exploration and a Gaussian actor and value-critic to develop the best action. The main difference lies in how the critic network and the actor policy function are defined.

Critic network definition

The LAC algorithm uses a single Lyapunov Critic instead of the double Q-Critic used in the SAC algorithm. This new Lyapunov critic is similar to the Q-Critic, but a square output activation function is used instead of an Identity output activation function. This is done to ensure that the network output is positive, such that condition (2) of the Lyapunov’s stability conditions holds.

L_{c}(s,a) = f_{\phi}(s,a)^{T}f_{\phi}(s,a)

Similar to SAC during training, L_{c} is updated by mean-squared Bellman error (MSBE) minimisation using the following objective function

J(L_{c}) = E_{\mathcal{D}}\left[\frac{1}{2}(L_{c}(s,a)-L_{target}(s,a))^2\right]

Where L_{target} is the approximation target received from the infinite-horizon discounted return value function

\begin{gather*}
 L(s) = E_{a\sim \pi}L_{target}(s,a) \\
 \textrm{with} \\
 L_{target}(s,a) = c + \max_{a'}\gamma L_{c}^{'}(s^{'}, a^{'})
 \end{gather*}

and \mathcal{D} the set of collected transition pairs.

Important

As explained by Han et al., 2020 the sum of cost over a finite time horizon can also be used as the approximation target (see Han et al., 2020 eq (9)):

L_{target}(s,a) = \sum_{t}^{t+N} \mathbb{E}_{c_{t}}

To use this Lyapunov candidate, supply the LAC algorithm with the horizon_length=N argument, where N is the length of the time horizon you want to use.

See also

The SLC package also contains a LAC implementation using a double Q-Critic (i.e., Lyapunov Twin Critic). For more information about this version, see the LAC Twin Critic documentation. This version can be used by specifying the latc algorithm in the CLI, by supplying the lac() function with the actor_critic=LyapunovActorTwinCritic argument or by directly calling the latc() function.

Policy function definition

In the LAC algorithm, the policy is optimised according to

\min_{\theta} \underE{s \sim \mathcal{D} \\ \xi \sim \mathcal{N}}{\lambda(L_{c}(s^{'}, f_{\theta}(\epsilon, s^{'}))-L_{c}(s, a) + \alpha_{3}c) + \mathcal{\alpha}\log \pi_{\theta}(f_{\theta}(\epsilon, s)|s) + \mathcal{H}_{t}}

In this f_{\theta}(\epsilon, s) represents the quashed Gaussian policy

\tilde{a}_{\theta}(s, \xi) = \tanh\left( \mu_{\theta}(s) + \sigma_{\theta}(s) \odot \xi \right), \;\;\;\;\; \xi \sim \mathcal{N}(0, I).

and \mathcal{H}_{t} the desired minimum expected entropy. When comparing this function with policy loss used in the SAC algorithm,

\max_{\theta} \underE{s \sim \mathcal{D} \\ \xi \sim \mathcal{N}}{Q_{\phi_1}(s,\tilde{a}_{\theta}(s,\xi)) - \alpha \log \pi_{\theta}(\tilde{a}_{\theta}(s,\xi)|s) + \mathcal{H}_{t}},

several differences stand out. First, the policy is minimised instead of maximised in the LAC algorithm. With the LAC algorithm, the objective is to stabilise a system or track a given reference. In these cases, instead of achieving a high return, we want to reduce the difference between the current position and a reference or equilibrium position. This leads us to the second difference that can be observed: the term in the SAC algorithm that represents the Q-values.

Q_{\phi_1}(s, f_{\theta}(\epsilon, s))

is in the LAC algorithm replaced by

\lambda(L_{c}(s^{'}, f_{\theta}(\epsilon, s^{'})) - L_{c}(s, a)  + \alpha_{3}c)

As a result, in the LAC algorithm, the loss function now increases the probability of actions that cause the system to be close to the equilibrium or reference value while decreasing the likelihood of actions that drive the system away from these values. The a_{3}c quadratic regularisation term ensures that the mean cost is positive. The \lambda term represents the Lyapunov Lagrange multiplier and is responsible for weighting the relative importance of the stability condition. Similar to the entropy Lagrange multiplier \alpha used in the SAC algorithm this term is updated by

\lambda \leftarrow \max(0, \lambda + \delta \bigtriangledown_{\lambda}J(\lambda)))

where \delta is the learning rate. This is done to constrain the average Lyapunov value during training.

Quick Fact

  • LAC is an off-policy algorithm.

  • It is guaranteed to be stable in mean cost.

  • The version of LAC implemented here can only be used for environments with continuous action spaces.

  • An alternate version of LAC, which slightly changes the policy update rule, can be implemented to handle discrete action spaces.

  • The SLC implementation of LAC does not support parallelisation.

Further Reading

For more information on the LAC algorithm, please check out the original paper of Han et al., 2020.

Pseudocode

\begin{algorithm}[H]
    \caption{Lyapunov-based Actor-Critic (LAC)}
    \label{alg1}
\begin{algorithmic}[1]
    \REQUIRE Maximum episode length $N$ and maximum update steps $M$
    \REPEAT
        \STATE Samples $s_{0}$ according to $\rho$
        \FOR{$t=1$ to $N$}
            \STATE Sample $a$ from $\pi(a|s)$ and step forward
            \STATE Observe $s'$, $c$ and store ($s$, $a$, $c$, $s'$) in $\mathcal{D}$
        \ENDFOR
        \FOR{$i=1$ to $M$}
            \STATE Sample mini-batches of transitions from $D$ and update $L_{c}$, $\pi$, Lagrance multipliers with eq. (7) and (14) of Han et al., 2020
        \ENDFOR
    \UNTIL{eq. 11 of Han et al., 2020 is satisfied}
\end{algorithmic}
\end{algorithm}

Implementation

You Should Know

In what follows, we give documentation for the PyTorch and TensorFlow implementations of LAC in SLC. They have nearly identical function calls and docstrings, except for details relating to model construction. However, we include both full docstrings for completeness.

Algorithm: PyTorch Version

stable_learning_control.algos.pytorch.lac.lac(env_fn, actor_critic=None, ac_kwargs={'activation': <class 'torch.nn.modules.activation.ReLU'>, 'hidden_sizes': {'actor': [256, 256], 'critic': [256, 256]}, 'output_activation': <class 'torch.nn.modules.activation.ReLU'>}, opt_type='minimize', max_ep_len=None, epochs=100, steps_per_epoch=2048, start_steps=0, update_every=100, update_after=1000, steps_per_update=100, num_test_episodes=10, alpha=0.99, alpha3=0.2, labda=0.99, gamma=0.99, polyak=0.995, target_entropy=None, adaptive_temperature=True, lr_a=0.0001, lr_c=0.0003, lr_alpha=0.0001, lr_labda=0.0003, lr_a_final=1e-10, lr_c_final=1e-10, lr_alpha_final=1e-10, lr_labda_final=1e-10, lr_decay_type='linear', lr_a_decay_type=None, lr_c_decay_type=None, lr_alpha_decay_type=None, lr_labda_decay_type=None, lr_decay_ref='epoch', batch_size=256, replay_size=1000000, horizon_length=0, seed=None, device='cpu', logger_kwargs={}, save_freq=1, start_policy=None, export=False)[source]

Trains the LAC algorithm in a given environment.

Parameters:
  • env_fn – A function which creates a copy of the environment. The environment must satisfy the gymnasium API.

  • actor_critic (torch.nn.Module, optional) –

    The constructor method for a Torch Module with an act method, a pi module and several Q or L modules. The act method and pi module should accept batches of observations as inputs, and the Q* and L modules should accept a batch of observations and a batch of actions as inputs. When called, these modules should return:

    Call

    Output Shape

    Description

    act

    (batch, act_dim)

    Numpy array of actions for each
    observation.

    Q*/L

    (batch,)

    Tensor containing one current estimate
    of Q*/L for the provided
    observations and actions. (Critical:
    make sure to flatten this!)

    Calling pi should return:

    Symbol

    Shape

    Description

    a

    (batch, act_dim)

    Tensor containing actions from policy
    given observations.

    logp_pi

    (batch,)

    Tensor containing log probabilities of
    actions in a. Importantly:
    gradients should be able to flow back
    into a.

    Defaults to LyapunovActorCritic

  • ac_kwargs (dict, optional) –

    Any kwargs appropriate for the ActorCritic object you provided to LAC. Defaults to:

    Kwarg

    Value

    hidden_sizes_actor

    256 x 2

    hidden_sizes_critic

    256 x 2

    activation

    torch.nn.ReLU

    output_activation

    torch.nn.ReLU

  • opt_type (str, optional) – The optimization type you want to use. Options maximize and minimize. Defaults to maximize.

  • max_ep_len (int, optional) – Maximum length of trajectory / episode / rollout. Defaults to the environment maximum.

  • epochs (int, optional) – Number of epochs to run and train agent. Defaults to 100.

  • steps_per_epoch (int, optional) – Number of steps of interaction (state-action pairs) for the agent and the environment in each epoch. Defaults to 2048.

  • start_steps (int, optional) – Number of steps for uniform-random action selection, before running real policy. Helps exploration. Defaults to 0.

  • update_every (int, optional) – Number of env interactions that should elapse between gradient descent updates. Defaults to 100.

  • update_after (int, optional) – Number of env interactions to collect before starting to do gradient descent updates. Ensures replay buffer is full enough for useful updates. Defaults to 1000.

  • steps_per_update (int, optional) – Number of gradient descent steps that are performed for each gradient descent update. This determines the ratio of env steps to gradient steps (i.e. update_every/ steps_per_update). Defaults to 100.

  • num_test_episodes (int, optional) – Number of episodes used to test the deterministic policy at the end of each epoch. This is used for logging the performance. Defaults to 10.

  • alpha (float, optional) – Entropy regularization coefficient (Equivalent to inverse of reward scale in the original SAC paper). Defaults to 0.99.

  • alpha3 (float, optional) – The Lyapunov constraint error boundary. Defaults to 0.2.

  • labda (float, optional) – The Lyapunov Lagrance multiplier. Defaults to 0.99.

  • gamma (float, optional) – Discount factor. (Always between 0 and 1.). Defaults to 0.99.

  • polyak (float, optional) –

    Interpolation factor in polyak averaging for target networks. Target networks are updated towards main networks according to:

    \theta_{\text{targ}} \leftarrow
\rho \theta_{\text{targ}} + (1-\rho) \theta

    where \rho is polyak (Always between 0 and 1, usually close to 1.). In some papers \rho is defined as (1 - \tau) where \tau is the soft replacement factor. Defaults to 0.995.

  • target_entropy (float, optional) –

    Initial target entropy used while learning the entropy temperature (alpha). Defaults to the maximum information (bits) contained in action space. This can be calculated according to :

    -{\prod }_{i=0}^{n}action\_di{m}_{i}\phantom{\rule{0ex}{0ex}}

  • adaptive_temperature (bool, optional) – Enabled Automating Entropy Adjustment for maximum Entropy RL_learning.

  • lr_a (float, optional) – Learning rate used for the actor. Defaults to 1e-4.

  • lr_c (float, optional) – Learning rate used for the (lyapunov) critic. Defaults to 1e-4.

  • lr_alpha (float, optional) – Learning rate used for the entropy temperature. Defaults to 1e-4.

  • lr_labda (float, optional) – Learning rate used for the Lyapunov Lagrance multiplier. Defaults to 3e-4.

  • lr_a_final (float, optional) – The final actor learning rate that is achieved at the end of the training. Defaults to 1e-10.

  • lr_c_final (float, optional) – The final critic learning rate that is achieved at the end of the training. Defaults to 1e-10.

  • lr_alpha_final (float, optional) – The final alpha learning rate that is achieved at the end of the training. Defaults to 1e-10.

  • lr_labda_final (float, optional) – The final labda learning rate that is achieved at the end of the training. Defaults to 1e-10.

  • lr_decay_type (str, optional) – The learning rate decay type that is used (options are: linear and exponential and constant). Defaults to linear.Can be overridden by the specific learning rate decay types.

  • lr_a_decay_type (str, optional) – The learning rate decay type that is used for the actor learning rate (options are: linear and exponential and constant). If not specified, the general learning rate decay type is used.

  • lr_c_decay_type (str, optional) – The learning rate decay type that is used for the critic learning rate (options are: linear and exponential and constant). If not specified, the general learning rate decay type is used.

  • lr_alpha_decay_type (str, optional) – The learning rate decay type that is used for the alpha learning rate (options are: linear and exponential and constant). If not specified, the general learning rate decay type is used.

  • lr_labda_decay_type (str, optional) – The learning rate decay type that is used for the labda learning rate (options are: linear and exponential and constant). If not specified, the general learning rate decay type is used.

  • lr_decay_ref (str, optional) – The reference variable that is used for decaying the learning rate (options: epoch and step). Defaults to epoch.

  • batch_size (int, optional) – Minibatch size for SGD. Defaults to 256.

  • replay_size (int, optional) – Maximum length of replay buffer. Defaults to 1e6.

  • horizon_length (int, optional) – The length of the finite-horizon used for the Lyapunov Critic target. Defaults to 0 meaning the infinite-horizon bellman backup is used.

  • seed (int) – Seed for random number generators. Defaults to None.

  • device (str, optional) – The device the networks are placed on (options: cpu, gpu, gpu:0, gpu:1, etc.). Defaults to cpu.

  • logger_kwargs (dict, optional) – Keyword args for EpochLogger.

  • save_freq (int, optional) – How often (in terms of gap between epochs) to save the current policy and value function.

  • start_policy (str) – Path of a already trained policy to use as the starting point for the training. By default a new policy is created.

  • export (bool) – Whether you want to export the model as a TorchScript such that it can be deployed on hardware. By default False.

Returns:

tuple containing:

Return type:

(tuple)

Saved Model Contents: PyTorch Version

The PyTorch version of the LAC algorithm is implemented by subclassing the torch.nn.Module class. As a result the model weights are saved using the model_state dictionary ( state_dict). These saved weights can be found in the torch_save/model_state.pt file. For an example of how to load a model using this file, see Experiment Outputs or the PyTorch documentation.

Algorithm: TensorFlow Version

Attention

The TensorFlow version is still experimental. It is not guaranteed to work, and it is not guaranteed to be up-to-date with the PyTorch version.

stable_learning_control.algos.tf2.lac.lac(env_fn, actor_critic=None, ac_kwargs={'activation': <function relu>, 'hidden_sizes': {'actor': [256, 256], 'critic': [256, 256]}, 'output_activation': <function relu>}, opt_type='minimize', max_ep_len=None, epochs=100, steps_per_epoch=2048, start_steps=0, update_every=100, update_after=1000, steps_per_update=100, num_test_episodes=10, alpha=0.99, alpha3=0.2, labda=0.99, gamma=0.99, polyak=0.995, target_entropy=None, adaptive_temperature=True, lr_a=0.0001, lr_c=0.0003, lr_alpha=0.0001, lr_labda=0.0003, lr_a_final=1e-10, lr_c_final=1e-10, lr_alpha_final=1e-10, lr_labda_final=1e-10, lr_decay_type='linear', lr_a_decay_type=None, lr_c_decay_type=None, lr_alpha_decay_type=None, lr_labda_decay_type=None, lr_decay_ref='epoch', batch_size=256, replay_size=1000000, seed=None, horizon_length=0, device='cpu', logger_kwargs={}, save_freq=1, start_policy=None, export=False)[source]

Trains the LAC algorithm in a given environment.

Parameters:
  • env_fn – A function which creates a copy of the environment. The environment must satisfy the gymnasium API.

  • actor_critic (tf.Module, optional) –

    The constructor method for a TensorFlow Module with an act method, a pi module and several Q or L modules. The act method and pi module should accept batches of observations as inputs, and the Q* and L modules should accept a batch of observations and a batch of actions as inputs. When called, these modules should return:

    Call

    Output Shape

    Description

    act

    (batch, act_dim)

    Numpy array of actions for each
    observation.

    Q*/L

    (batch,)

    Tensor containing one current estimate
    of Q*/L for the provided
    observations and actions. (Critical:
    make sure to flatten this!)

    Calling pi should return:

    Symbol

    Shape

    Description

    a

    (batch, act_dim)

    Tensor containing actions from policy
    given observations.

    logp_pi

    (batch,)

    Tensor containing log probabilities of
    actions in a. Importantly:
    gradients should be able to flow back
    into a.

    Defaults to LyapunovActorCritic

  • ac_kwargs (dict, optional) –

    Any kwargs appropriate for the ActorCritic object you provided to LAC. Defaults to:

    Kwarg

    Value

    hidden_sizes_actor

    256 x 2

    hidden_sizes_critic

    256 x 2

    activation

    tf.nn.ReLU

    output_activation

    tf.nn.ReLU

  • opt_type (str, optional) – The optimization type you want to use. Options maximize and minimize. Defaults to maximize.

  • max_ep_len (int, optional) – Maximum length of trajectory / episode / rollout. Defaults to the environment maximum.

  • epochs (int, optional) – Number of epochs to run and train agent. Defaults to 100.

  • steps_per_epoch (int, optional) – Number of steps of interaction (state-action pairs) for the agent and the environment in each epoch. Defaults to 2048.

  • start_steps (int, optional) – Number of steps for uniform-random action selection, before running real policy. Helps exploration. Defaults to 0.

  • update_every (int, optional) – Number of env interactions that should elapse between gradient descent updates. Defaults to 100.

  • update_after (int, optional) – Number of env interactions to collect before starting to do gradient descent updates. Ensures replay buffer is full enough for useful updates. Defaults to 1000.

  • steps_per_update (int, optional) – Number of gradient descent steps that are performed for each gradient descent update. This determines the ratio of env steps to gradient steps (i.e. update_every/ steps_per_update). Defaults to 100.

  • num_test_episodes (int, optional) – Number of episodes used to test the deterministic policy at the end of each epoch. This is used for logging the performance. Defaults to 10.

  • alpha (float, optional) – Entropy regularization coefficient (Equivalent to inverse of reward scale in the original SAC paper). Defaults to 0.99.

  • alpha3 (float, optional) – The Lyapunov constraint error boundary. Defaults to 0.2.

  • labda (float, optional) – The Lyapunov Lagrance multiplier. Defaults to 0.99.

  • gamma (float, optional) – Discount factor. (Always between 0 and 1.). Defaults to 0.99.

  • polyak (float, optional) –

    Interpolation factor in polyak averaging for target networks. Target networks are updated towards main networks according to:

    \theta_{\text{targ}} \leftarrow
\rho \theta_{\text{targ}} + (1-\rho) \theta

    where \rho is polyak (Always between 0 and 1, usually close to 1.). In some papers \rho is defined as (1 - \tau) where \tau is the soft replacement factor. Defaults to 0.995.

  • target_entropy (float, optional) –

    Initial target entropy used while learning the entropy temperature (alpha). Defaults to the maximum information (bits) contained in action space. This can be calculated according to :

    -{\prod }_{i=0}^{n}action\_di{m}_{i}\phantom{\rule{0ex}{0ex}}

  • adaptive_temperature (bool, optional) – Enabled Automating Entropy Adjustment for maximum Entropy RL_learning.

  • lr_a (float, optional) – Learning rate used for the actor. Defaults to 1e-4.

  • lr_c (float, optional) – Learning rate used for the (lyapunov) critic. Defaults to 1e-4.

  • lr_alpha (float, optional) – Learning rate used for the entropy temperature. Defaults to 1e-4.

  • lr_labda (float, optional) – Learning rate used for the Lyapunov Lagrance multiplier. Defaults to 3e-4.

  • lr_a_final (float, optional) – The final actor learning rate that is achieved at the end of the training. Defaults to 1e-10.

  • lr_c_final (float, optional) – The final critic learning rate that is achieved at the end of the training. Defaults to 1e-10.

  • lr_alpha_final (float, optional) – The final alpha learning rate that is achieved at the end of the training. Defaults to 1e-10.

  • lr_labda_final (float, optional) – The final labda learning rate that is achieved at the end of the training. Defaults to 1e-10.

  • lr_decay_type (str, optional) – The learning rate decay type that is used (options are: linear and exponential and constant). Defaults to linear.Can be overridden by the specific learning rate decay types.

  • lr_a_decay_type (str, optional) – The learning rate decay type that is used for the actor learning rate (options are: linear and exponential and constant). If not specified, the general learning rate decay type is used.

  • lr_c_decay_type (str, optional) – The learning rate decay type that is used for the critic learning rate (options are: linear and exponential and constant). If not specified, the general learning rate decay type is used.

  • lr_alpha_decay_type (str, optional) – The learning rate decay type that is used for the alpha learning rate (options are: linear and exponential and constant). If not specified, the general learning rate decay type is used.

  • lr_labda_decay_type (str, optional) – The learning rate decay type that is used for the labda learning rate (options are: linear and exponential and constant). If not specified, the general learning rate decay type is used.

  • lr_decay_type – The learning rate decay type that is used ( options are: linear and exponential and constant). Defaults to linear.

  • lr_decay_ref (str, optional) – The reference variable that is used for decaying the learning rate (options: epoch and step). Defaults to epoch.

  • batch_size (int, optional) – Minibatch size for SGD. Defaults to 256.

  • replay_size (int, optional) – Maximum length of replay buffer. Defaults to 1e6.

  • horizon_length (int, optional) – The length of the finite-horizon used for the Lyapunov Critic target. Defaults to 0 meaning the infinite-horizon bellman backup is used.

  • seed (int) – Seed for random number generators. Defaults to None.

  • device (str, optional) – The device the networks are placed on (options: cpu, gpu, gpu:0, gpu:1, etc.). Defaults to cpu.

  • logger_kwargs (dict, optional) – Keyword args for EpochLogger.

  • save_freq (int, optional) – How often (in terms of gap between epochs) to save the current policy and value function.

  • start_policy (str) – Path of a already trained policy to use as the starting point for the training. By default a new policy is created.

  • export (bool) – Whether you want to export the model in the SavedModel format such that it can be deployed to hardware. By default False.

Returns:

tuple containing:

Return type:

(tuple)

Saved Model Contents: TensorFlow Version

The TensorFlow version of the LAC algorithm is implemented by subclassing the tf.nn.Model class. As a result, both the full model and the current model weights are saved. The complete model can be found in the saved_model.pb file, while the current weights checkpoints are found in the tf_safe/weights_checkpoint* file. For an example of using these two methods, see Experiment Outputs or the TensorFlow documentation.

References

Relevant Papers

Acknowledgements

  • Parts of this documentation are directly copied, with the author’s consent, from the original paper of Han et. al 2019.