stable_gym.envs.classic_control.ex3_ekf
Noisy master slave system (Ex3EKF) gymnasium environment.
Dynamics
The dynamics of the system whose state is to be estimated are given by:
In which the state vector \((x(k)\) is given by:
and the measurement vector \((y(k))\) is given by:
Estimator design:
Submodules
Classes
Noisy master slave system |
Package Contents
- class stable_gym.envs.classic_control.ex3_ekf.Ex3EKF(render_mode=None, clipped_action=True)[source]
Bases:
gymnasium.Env
Noisy master slave system
- Description:
The goal of the agent in the Ex3EKF environment is to act in such a way that estimator perfectly estimated the original noisy system. By doing this it serves as a RL based stationary Kalman filter. First presented by Wu et al. 2023.
- Observation:
Type: Box(4)
Num
Observation
Min
Max
0
The estimated angle
-10000 rad
10000 rad
1
The estimated frequency
-10000 hz
10000 hz
2
Actual angle
-10000 rad
10000 rad
3
Actual frequency
-10000 rad
10000 rad
- Actions:
Type: Box(2)
Num
Action
0
First action coming from the RL Kalman filter
1
Second action coming from the RL Kalman filter
- Cost:
A cost, computed as the sum of the squared differences between the estimated and the actual states:
\[C = {(\hat{x}_1 - x_1)}^2 + {(\hat{x}_2 - x_2)}^2\]- Starting State:
All observations are assigned a uniform random value in
[-0.05..0.05]
- Episode Termination:
When the step cost is higher than 100.
- Solved Requirements:
Considered solved when the average cost is lower than 300.
- state
The current system state.
- Type:
Initialise new Ex3EKF environment instance.
- Parameters:
- _action_clip_warning = False
- t = 0.0
- dt = 0.1
- q1 = 0.01
- g = 9.81
- l_net = 1.0
- mean1 = [0, 0]
- cov1
- mean2 = 0
- cov2 = 0.01
- missing_rate = 0
- sigma = 0
- high
- action_space
- observation_space
- reward_range = (0.0, 100.0)
- _clipped_action
- viewer = None
- state = None
- output = None
- steps_beyond_done = None
- step(action)[source]
Take step into the environment.
- Parameters:
action (numpy.ndarray) – The action we want to perform in the environment.
- Returns:
tuple containing:
obs (
np.ndarray
): Environment observation.cost (
float
): Cost of the action.terminated (
bool
): Whether the episode is terminated.truncated (
bool
): Whether the episode was truncated. This value is set by wrappers when for example a time limit is reached or the agent goes out of bounds.info (
dict
): Additional information about the environment.
- Return type:
(tuple)
- reset(seed=None, options=None)[source]
Reset gymnasium environment.
- Parameters:
- Returns:
tuple containing:
obs (
numpy.ndarray
): Initial environment observation.info (
dict
): Dictionary containing additional information.
- Return type:
(tuple)
- reference(x)[source]
Returns the current value of the periodic reference signal that is tracked by the Synthetic oscillatory network.
- abstract render(mode='human')[source]
Render one frame of the environment.
- Parameters:
mode (str, optional) – Gym rendering mode. The default mode will do something human friendly, such as pop up a window.
- Raises:
NotImplementedError – Will throw a NotImplimented error since the render method has not yet been implemented.
Note
This currently is not yet implemented.
- property tau
- Alias for the environment step size. Done for compatibility with the
- other gymnasium environments.
- property physics_time
- Returns the physics time. Alias for :attr:`.t`.