stable_gym.envs.classic_control.ex3_ekf

Noisy master slave system (Ex3EKF) gymnasium environment.

Dynamics

The dynamics of the system whose state is to be estimated are given by:

\[ \begin{split} x(k+1) &= A x(k) + w(k) \\ \end{split} \]

In which the state vector \((x(k)\) is given by:

\[ \begin{align*} x_1 &: \text{angle} \\ x_2 &: \text{frequency} \\ x_3 &: \text{amplitude} \end{align*} \]

and the measurement vector \((y(k))\) is given by:

\[ \begin{split} y(k) &= x_3(k) \cdot \sin(x_1(k)) + v(k) \\ A &= \begin{bmatrix} 1 & dt & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \\ x(0) &\sim \mathcal{N}\left(\begin{bmatrix}0 \\ 10 \\ 1\end{bmatrix}, \begin{bmatrix} 3 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 3 \end{bmatrix}\right) \\ w(k) &\sim \mathcal{N}\left(\begin{bmatrix}0 \\ 0 \\ 0\end{bmatrix}, \begin{bmatrix} \frac{1}{3}dt^3 q_1 & \frac{1}{2}dt^2 q_1 & 0 \\ \frac{1}{2}dt^2 q_1 & dt q_1 & 0 \\ 0 & 0 & dt q_2 \end{bmatrix}\right) \\ v(k) &\sim \mathcal{N}(0, 1) \end{split} \]

Estimator design:

\[ \begin{split} \hat{x}(k+1) &= A \hat{x}(k) + u \\ \text{where } u &= [u1, u2, u3]', \ u = l(\hat{x}(k), y(k)) \text{ come from the policy network } l(.,.). \end{split} \]

Submodules

Package Contents

Classes

Ex3EKF

Noisy master slave system

class stable_gym.envs.classic_control.ex3_ekf.Ex3EKF(render_mode=None, clipped_action=True)[source]

Bases: gymnasium.Env

Noisy master slave system

Description:

The goal of the agent in the Ex3EKF environment is to act in such a way that estimator perfectly estimated the original noisy system. By doing this it serves as a RL based stationary Kalman filter. First presented by Wu et al. 2023.

Observation:

Type: Box(4)

Num

Observation

Min

Max

0

The estimated angle

-10000 rad

10000 rad

1

The estimated frequency

-10000 hz

10000 hz

2

Actual angle

-10000 rad

10000 rad

3

Actual frequency

-10000 rad

10000 rad

Actions:

Type: Box(2)

Num

Action

0

First action coming from the RL Kalman filter

1

Second action coming from the RL Kalman filter

Cost:

A cost, computed as the sum of the squared differences between the estimated and the actual states:

\[C = {(\hat{x}_1 - x_1)}^2 + {(\hat{x}_2 - x_2)}^2\]
Starting State:

All observations are assigned a uniform random value in [-0.05..0.05]

Episode Termination:
  • When the step cost is higher than 100.

Solved Requirements:

Considered solved when the average cost is lower than 300.

state

The current system state.

Type:

numpy.ndarray

t

The current time step.

Type:

float

dt

The environment step size. Also available as tau.

Type:

float

sigma

The variance of the system noise.

Type:

float

Initialise new Ex3EKF environment instance.

Parameters:
  • render_mode (str, optional) – The render mode you want to use. Defaults to None. Not used in this environment.

  • clipped_action (str, optional) – Whether the actions should be clipped if they are greater than the set action limit. Defaults to True.

property tau

Alias for the environment step size. Done for compatibility with the other gymnasium environments.

property physics_time

Returns the physics time. Alias for t.

step(action)[source]

Take step into the environment.

Parameters:

action (numpy.ndarray) – The action we want to perform in the environment.

Returns:

tuple containing:

  • obs (np.ndarray): Environment observation.

  • cost (float): Cost of the action.

  • terminated (bool): Whether the episode is terminated.

  • truncated (bool): Whether the episode was truncated. This value is set by wrappers when for example a time limit is reached or the agent goes out of bounds.

  • info (dict): Additional information about the environment.

Return type:

(tuple)

reset(seed=None, options=None)[source]

Reset gymnasium environment.

Parameters:
  • seed (int, optional) – A random seed for the environment. By default None`.

  • options (dict, optional) – A dictionary containing additional options for resetting the environment. By default None. Not used in this environment.

Returns:

tuple containing:

  • obs (numpy.ndarray): Initial environment observation.

  • info (dict): Dictionary containing additional information.

Return type:

(tuple)

reference(x)[source]

Returns the current value of the periodic reference signal that is tracked by the Synthetic oscillatory network.

Parameters:

x (float) – The reference value.

Returns:

The current reference value.

Return type:

float

abstract render(mode='human')[source]

Render one frame of the environment.

Parameters:

mode (str, optional) – Gym rendering mode. The default mode will do something human friendly, such as pop up a window.

Raises:

NotImplementedError – Will throw a NotImplimented error since the render method has not yet been implemented.

Note

This currently is not yet implemented.