stable_gym.envs.classic_control.ex3_ekf

Noisy master slave system (Ex3EKF) gymnasium environment.

Dynamics

The dynamics of the system whose state is to be estimated are given by:

\[ \begin{split} x(k+1) &= A x(k) + w(k) \\ \end{split} \]

In which the state vector \((x(k)\) is given by:

\[ \begin{align*} x_1 &: \text{angle} \\ x_2 &: \text{frequency} \\ x_3 &: \text{amplitude} \end{align*} \]

and the measurement vector \((y(k))\) is given by:

\[ \begin{split} y(k) &= x_3(k) \cdot \sin(x_1(k)) + v(k) \\ A &= \begin{bmatrix} 1 & dt & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \\ x(0) &\sim \mathcal{N}\left(\begin{bmatrix}0 \\ 10 \\ 1\end{bmatrix}, \begin{bmatrix} 3 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 3 \end{bmatrix}\right) \\ w(k) &\sim \mathcal{N}\left(\begin{bmatrix}0 \\ 0 \\ 0\end{bmatrix}, \begin{bmatrix} \frac{1}{3}dt^3 q_1 & \frac{1}{2}dt^2 q_1 & 0 \\ \frac{1}{2}dt^2 q_1 & dt q_1 & 0 \\ 0 & 0 & dt q_2 \end{bmatrix}\right) \\ v(k) &\sim \mathcal{N}(0, 1) \end{split} \]

Estimator design:

\[ \begin{split} \hat{x}(k+1) &= A \hat{x}(k) + u \\ \text{where } u &= [u1, u2, u3]', \ u = l(\hat{x}(k), y(k)) \text{ come from the policy network } l(.,.). \end{split} \]

Submodules

stable_gym.envs.classic_control.ex3_ekf.ex3_ekf

Classes

Ex3EKF

Noisy master slave system

Package Contents

class stable_gym.envs.classic_control.ex3_ekf.Ex3EKF(render_mode=None, clipped_action=True)[source]

Bases: gymnasium.Env

Noisy master slave system

Description:

The goal of the agent in the Ex3EKF environment is to act in such a way that estimator perfectly estimated the original noisy system. By doing this it serves as a RL based stationary Kalman filter. First presented by Wu et al. 2023.

Observation:

Type: Box(4)

Num	Observation	Min	Max
0	The estimated angle	-10000 rad	10000 rad
1	The estimated frequency	-10000 hz	10000 hz
2	Actual angle	-10000 rad	10000 rad
3	Actual frequency	-10000 rad	10000 rad

Actions:

Type: Box(2)

Num	Action
0	First action coming from the RL Kalman filter
1	Second action coming from the RL Kalman filter

Cost:

A cost, computed as the sum of the squared differences between the estimated and the actual states:

\[C = {(\hat{x}_1 - x_1)}^2 + {(\hat{x}_2 - x_2)}^2\]

Starting State:

All observations are assigned a uniform random value in [-0.05..0.05]

Episode Termination:

When the step cost is higher than 100.

Solved Requirements:

Considered solved when the average cost is lower than 300.

state

The current system state.

Type:: numpy.ndarray

t

The current time step.

Type:: float

dt

The environment step size. Also available as tau.

Type:: float

sigma

The variance of the system noise.

Type:: float

Initialise new Ex3EKF environment instance.

Parameters:

render_mode (str, optional) – The render mode you want to use. Defaults to None. Not used in this environment.
clipped_action (str, optional) – Whether the actions should be clipped if they are greater than the set action limit. Defaults to True.

_action_clip_warning = False

t = 0.0

dt = 0.1

q1 = 0.01

g = 9.81

l_net = 1.0

mean1 = [0, 0]

cov1

mean2 = 0

cov2 = 0.01

missing_rate = 0

sigma = 0

high

action_space

observation_space

reward_range = (0.0, 100.0)

_clipped_action

viewer = None

state = None

output = None

steps_beyond_done = None

step(action)[source]

Take step into the environment.

Parameters:

action (numpy.ndarray) – The action we want to perform in the environment.

Returns:

tuple containing:

obs (np.ndarray): Environment observation.

cost (float): Cost of the action.

terminated (bool): Whether the episode is terminated.

truncated (bool): Whether the episode was truncated. This value is set by wrappers when for example a time limit is reached or the agent goes out of bounds.

info (dict): Additional information about the environment.

Return type:

(tuple)

reset(seed=None, options=None)[source]

Reset gymnasium environment.

Parameters:

seed (int, optional) – A random seed for the environment. By default None`.
options (dict, optional) – A dictionary containing additional options for resetting the environment. By default None. Not used in this environment.

Returns:

tuple containing:

obs (numpy.ndarray): Initial environment observation.

info (dict): Dictionary containing additional information.

Return type:

(tuple)

reference(x)[source]

Returns the current value of the periodic reference signal that is tracked by the Synthetic oscillatory network.

Parameters:: x (float) – The reference value.
Returns:: The current reference value.
Return type:: float

abstract render(mode='human')[source]

Render one frame of the environment.

Parameters:: mode (str, optional) – Gym rendering mode. The default mode will do something human friendly, such as pop up a window.
Raises:: NotImplementedError – Will throw a NotImplimented error since the render method has not yet been implemented.

Note

This currently is not yet implemented.

property tau
Alias for the environment step size. Done for compatibility with the
other gymnasium environments.

property physics_time
Returns the physics time. Alias for :attr:`.t`.