stable_learning_control.algos.pytorch.common.buffers

Contains several replay buffers used in the Pytorch algorithms.

Module Contents

Classes

ReplayBuffer

Wrapper around the general FIFO

FiniteHorizonReplayBuffer

Wrapper around the general FIFO

TrajectoryBuffer

Wrapper around the general

class stable_learning_control.algos.pytorch.common.buffers.ReplayBuffer(device='cpu', *args, **kwargs)[source]

Bases: stable_learning_control.algos.common.buffers.ReplayBuffer

Wrapper around the general FIFO ReplayBuffer which makes sure a torch.tensor is returned when sampling.

device

The device the experiences are placed on (options: cpu, gpu, gpu:0, gpu:1, etc.).

Type:

str

Initialise the ReplayBuffer object.

Parameters:
  • device (str, optional) – The computational device to put the sampled experiences on (options: cpu, gpu, gpu:0, gpu:1, etc.). Defaults to cpu.

  • *args – All args to pass to the ReplayBuffer parent class.

  • **kwargs – All kwargs to pass to the class:ReplayBuffer parent class.

sample_batch(*args, **kwargs)[source]

Retrieve a batch of experiences from buffer.

Parameters:
Returns:

A batch of experiences.

Return type:

dict

class stable_learning_control.algos.pytorch.common.buffers.FiniteHorizonReplayBuffer(device='cpu', *args, **kwargs)[source]

Bases: stable_learning_control.algos.common.buffers.FiniteHorizonReplayBuffer

Wrapper around the general FIFO FiniteHorizonReplayBuffer which makes sure a torch.tensor is returned when sampling.

device

The device the experiences are placed on (options: cpu, gpu, gpu:0, gpu:1, etc.).

Type:

str

Initialise the FiniteHorizonReplayBuffer object.

Parameters:
  • device (str, optional) – The computational device to put the sampled experiences on (options: cpu, gpu, gpu:0, gpu:1, etc.). Defaults to cpu.

  • *args – All args to pass to the FiniteHorizonReplayBuffer parent class.

  • **kwargs – All kwargs to pass to the class:FiniteHorizonReplayBuffer parent class.

sample_batch(*args, **kwargs)[source]

Retrieve a batch of experiences from buffer.

Parameters:
Returns:

A batch of experiences.

Return type:

dict

class stable_learning_control.algos.pytorch.common.buffers.TrajectoryBuffer(device='cpu', *args, **kwargs)[source]

Bases: stable_learning_control.algos.common.buffers.TrajectoryBuffer

Wrapper around the general TrajectoryBuffer which makes sure a torch.tensor is returned when sampling.

device

The device the experiences are placed on (options: cpu, gpu, gpu:0, gpu:1, etc.).

Type:

str

Initialise the TrajectoryBuffer object.

Parameters:
  • device (str, optional) – The computational device to put the sampled experiences on (options: cpu, gpu, gpu:0, gpu:1, etc.). Defaults to cpu.

  • *args – All args to pass to the TrajectoryBuffer parent class.

  • **kwargs – All kwargs to pass to the TrajectoryBuffer parent class.

get(*args, **kwargs)[source]

Retrieve the trajectory buffer.

Call this at the end of an epoch to get all of the data from the buffer. Also, resets some pointers in the buffer.

Parameters:
  • *args – All args to pass to the get() parent method.

  • **kwargs – All kwargs to pass to the get() parent method.

Returns:

The trajectory buffer.

Return type:

dict