stable_gym.envs.biological

Stable Gym gymnasium environments that are based on Biological systems.

Subpackages

Classes

Oscillator

Synthetic oscillatory network environment.

OscillatorComplicated

Challenging (i.e. complicated) oscillatory network environment. This environment

Package Contents

class stable_gym.envs.biological.Oscillator(render_mode=None, max_cost=100.0, reference_target_position=8.0, reference_amplitude=7.0, reference_frequency=1 / 200, reference_phase_shift=0.0, clip_action=True, exclude_reference_from_observation=False, exclude_reference_error_from_observation=False, action_space_dtype=np.float64, observation_space_dtype=np.float64)[source]

Bases: gymnasium.Env

Synthetic oscillatory network environment.

Note

Can also be used in a vectorized manner. See the gym.vector documentation.

Description:

The goal of the agent in the oscillator environment is to act in such a way that one of the proteins of the synthetic oscillatory network follows a supplied reference signal.

Source:

This environment corresponds to the Oscillator environment used in the paper Han et al. 2020. In our implementation several additional features were added to the environment to make it more flexible and easier to use:

  • Environment arguments now allow for modification of the reference signal parameters.

  • System parameters can now be individually adjusted for each protein, rather than applying the same parameters across all proteins.

  • The reference can be omitted from the observation.

  • Reference error can be included in the info dictionary.

  • The observation space was expanded to accurately reproduce the plots presented in Han et al. 2020, which was not possible with the original code’s observation space.

  • Added an adjustable max_cost threshold for episode termination, defaulting to 100 to match the original environment.

Observation:

Type: Box(7) or Box(8) depending on the exclude_reference_error_from_observation argument.

Num

Observation

Min

Max

0

Lacl mRNA transcripts concentration

0

\(\infty\)

1

tetR mRNA transcripts concentration

0

\(\infty\)

2

CI mRNA transcripts concentration

0

\(\infty\)

3

lacI (repressor) protein concentration
(Inhibits transcription of the tetR gene)

0

\(\infty\)

4

tetR (repressor) protein concentration
(Inhibits transcription of CI gene)

0

\(\infty\)

5

CI (repressor) protein concentration
(Inhibits transcription of lacI gene)

0

\(\infty\)

6

The reference we want to follow

0

\(\infty\)

Optional - The error between the current
value of protein 1 and the reference

\(-\infty\)

\(\infty\)

Actions:

Type: Box(3)

Num

Action

Min

Max

0

Relative intensity of light signal that induce the
expression of the Lacl mRNA gene.

0

1

1

Relative intensity of light signal that induce the
expression of the tetR mRNA gene.

0

1

2

Relative intensity of light signal that induce the
expression of the CI mRNA gene.

0

1

Cost:

A cost, computed as the sum of the squared differences between the estimated and the actual states:

\[C = {p_1 - r_1}^2\]
Starting State:

All observations are assigned a uniform random value in [0..5]

Episode Termination:
  • An episode is terminated when the maximum step limit is reached.

  • The step exceeds a threshold (default is 100). This threshold can be adjusted using the max_cost environment argument.

Solved Requirements:

Considered solved when the average cost is lower than 300.

How to use:
import stable_gym
import gymnasium as gym
env = gym.make("stable_gym:Oscillator-v1")

On reset, the options parameter allows the user to change the bounds used to determine the new random state when random=True.

state

The current system state.

Type:

numpy.ndarray

t

The current time step.

Type:

float

dt

The environment step size. Also available as tau.

Type:

float

sigma

The variance of the system noise.

Type:

float

max_cost

The maximum cost allowed before the episode is terminated.

Type:

float

Initialise a new Oscillator environment instance.

Parameters:
  • render_mode (str, optional) – The render mode you want to use. Defaults to None. Not used in this environment.

  • max_cost (float, optional) – The maximum cost allowed before the episode is terminated. Defaults to 100.0.

  • reference_target_position – The reference target position, by default 8.0 (i.e. the mean of the reference signal).

  • reference_amplitude – The reference amplitude, by default 7.0.

  • reference_frequency – The reference frequency, by default 0.005.

  • reference_phase_shift – The reference phase shift, by default 0.0.

  • clip_action (str, optional) – Whether the actions should be clipped if they are greater than the set action limit. Defaults to True.

  • exclude_reference_from_observation (bool, optional) – Whether the reference should be excluded from the observation. Defaults to False.

  • exclude_reference_error_from_observation (bool, optional) – Whether the error should be excluded from the observation. Defaults to False.

  • action_space_dtype (union[numpy.dtype, str], optional) – The data type of the action space. Defaults to np.float64.

  • observation_space_dtype (union[numpy.dtype, str], optional) – The data type of the observation space. Defaults to np.float64.

max_cost
_action_clip_warning = False
_clip_action
_exclude_reference_from_observation
_exclude_reference_error_from_observation
_action_space_dtype
_observation_space_dtype
_action_dtype_conversion_warning = False
t = 0.0
dt = 1.0
_init_state
_init_state_range
K1 = 1.0
K2 = 1.0
K3 = 1.0
a1 = 1.6
a2 = 1.6
a3 = 1.6
gamma1 = 0.16
gamma2 = 0.16
gamma3 = 0.16
beta1 = 0.16
beta2 = 0.16
beta3 = 0.16
c1 = 0.06
c2 = 0.06
c3 = 0.06
b1 = 5.0
b2 = 5.0
b3 = 5.0
delta1 = 0.0
delta2 = 0.0
delta3 = 0.0
delta4 = 0.0
delta5 = 0.0
delta6 = 0.0
obs_low
obs_high
action_space
observation_space
reward_range
viewer = None
state = None
steps_beyond_done = None
reference_target_pos
reference_amplitude
reference_frequency
phase_shift
step(action)[source]

Take step into the environment.

Parameters:

action (numpy.ndarray) – The action we want to perform in the environment.

Returns:

tuple containing:

  • obs (np.ndarray): Environment observation.

  • cost (float): Cost of the action.

  • terminated (bool): Whether the episode is terminated.

  • truncated (bool): Whether the episode was truncated. This value is set by wrappers when for example a time limit is reached or the agent goes out of bounds.

  • info (dict): Additional information about the environment.

Return type:

(tuple)

reset(seed=None, options=None, random=True)[source]

Reset gymnasium environment.

Parameters:
  • seed (int, optional) – A random seed for the environment. By default None.

  • options (dict, optional) – A dictionary containing additional options for resetting the environment. By default None. Not used in this environment.

  • random (bool, optional) – Whether we want to randomly initialise the environment. By default True.

Returns:

tuple containing:

  • obs (numpy.ndarray): Initial environment observation.

  • info (dict): Dictionary containing additional information.

Return type:

(tuple)

reference(t)[source]

Returns the current value of the periodic reference signal that is tracked by the Synthetic oscillatory network.

Parameters:

t (float) – The current time step.

Returns:

The current reference value.

Return type:

float

Note

This uses the general form of a periodic signal:

\[\begin{split}y(t) = A \sin(\omega t + \phi) + C \\ y(t) = A \sin(2 \pi f t + \phi) + C \\ y(t) = A \sin(\frac{2 \pi}{T} t + \phi) + C\end{split}\]

Where:

  • \(t\) is the time.

  • \(A\) is the amplitude of the signal.

  • \(\omega\) is the frequency of the signal.

  • \(f\) is the frequency of the signal.

  • \(T\) is the period of the signal.

  • \(\phi\) is the phase of the signal.

  • \(C\) is the offset of the signal.

abstract render(mode='human')[source]

Render one frame of the environment.

Parameters:

mode (str, optional) – Gym rendering mode. The default mode will do something human friendly, such as pop up a window.

Raises:

NotImplementedError – Will throw a NotImplimented error since the render method has not yet been implemented.

Note

This currently is not yet implemented.

property tau
Alias for the environment step size. Done for compatibility with the
other gymnasium environments.
property physics_time
Returns the physics time. Alias for :attr:`.t`.
class stable_gym.envs.biological.OscillatorComplicated(render_mode=None, max_cost=np.inf, reference_target_position=8.0, reference_amplitude=7.0, reference_frequency=1 / 200, reference_phase_shift=0.0, clip_action=True, exclude_reference_from_observation=False, exclude_reference_error_from_observation=False, action_space_dtype=np.float64, observation_space_dtype=np.float64)[source]

Bases: gymnasium.Env

Challenging (i.e. complicated) oscillatory network environment. This environment class is based on the Oscillator environment class but has an additional protein, mRNA transcription and light input.

Note

Can also be used in a vectorized manner. See the gym.vector documentation.

Description:

The goal of the agent in the oscillator environment is to act in such a way that one of the proteins of the synthetic oscillatory network follows a supplied reference signal.

Source:

This environment corresponds to the Oscillator environment used in the paper Han et al. 2020. In our implementation several additional features were added to the environment to make it more flexible and easier to use:

  • Environment arguments now allow for modification of the reference signal parameters.

  • System parameters can now be individually adjusted for each protein, rather than applying the same parameters across all proteins.

  • The reference can be omitted from the observation.

  • Reference error can be included in the info dictionary.

  • The observation space was expanded to accurately reproduce the plots presented in Han et al. 2020, which was not possible with the original code’s observation space.

  • Added an adjustable max_cost threshold for episode termination, defaulting to \(\infty\) to match the original environment.

Observation:

Type: Box(9) or Box(10) depending on the exclude_reference_error_from_observation argument.

Num

Observation

Min

Max

0

Lacl mRNA transcripts concentration

0

\(\infty\)

1

tetR mRNA transcripts concentration

0

\(\infty\)

2

CI mRNA transcripts concentration

0

\(\infty\)

3

Extra protein mRNA transcripts concentration

0

\(\infty\)

4

lacI (repressor) protein concentration
(Inhibits transcription of the tetR gene)

0

\(\infty\)

5

tetR (repressor) protein concentration
(Inhibits transcription of CI gene)

0

\(\infty\)

6

CI (repressor) protein concentration
(Inhibits transcription of extra protein gene)

0

\(\infty\)

7

Extra (repressor) protein concentration
(Inhibits transcription of lacI gene)

0

\(\infty\)

8

The reference we want to follow

0

\(\infty\)

Optional - The error between the current
value of protein 1 and the reference

-\(\infty\)

\(\infty\)

Actions:

Type: Box(3)

Num

Action

Min

Max

0

Relative intensity of light signal that induce the
expression of the Lacl mRNA gene.

0

1

1

Relative intensity of light signal that induce the
expression of the tetR mRNA gene.

0

1

2

Relative intensity of light signal that induce the
expression of the CI mRNA gene.

0

1

3

Relative intensity of light signal that induce the
expression of the extra protein mRNA gene.

0

1

Cost:

A cost, computed as the sum of the squared differences between the estimated and the actual states:

\[C = {p_1 - r_1}^2\]
Starting State:

All observations are assigned a uniform random value in [0..5]

Episode Termination:
  • An episode is terminated when the maximum step limit is reached.

  • The step exceeds a threshold (default is \(\infty\)). This threshold can be adjusted using the max_cost environment argument.

Solved Requirements:

Considered solved when the average cost is lower than 300.

How to use:
import stable_gym
import gymnasium as gym
env = gym.make("stable_gym:OscillatorComplicated-v1")

On reset, the options parameter allows the user to change the bounds used to determine the new random state when random=True.

state

The current system state.

Type:

numpy.ndarray

t

The current time step.

Type:

float

dt

The environment step size. Also available as tau.

Type:

float

sigma

The variance of the system noise.

Type:

float

max_cost

The maximum cost allowed before the episode is terminated.

Type:

float

Initialise a new OscillatorComplicated environment instance.

Parameters:
  • render_mode (str, optional) – The render mode you want to use. Defaults to None. Not used in this environment.

  • max_cost (float, optional) – The maximum cost allowed before the episode is terminated. Defaults to np.inf.

  • reference_target_position – The reference target position, by default 8.0 (i.e. the mean of the reference signal).

  • reference_amplitude – The reference amplitude, by default 7.0.

  • reference_frequency – The reference frequency, by default 0.005.

  • reference_phase_shift – The reference phase shift, by default 0.0.

  • clip_action (str, optional) – Whether the actions should be clipped if they are greater than the set action limit. Defaults to True.

  • exclude_reference_from_observation (bool, optional) – Whether the reference should be excluded from the observation. Defaults to False.

  • exclude_reference_error_from_observation (bool, optional) – Whether the error should be excluded from the observation. Defaults to False.

  • action_space_dtype (union[numpy.dtype, str], optional) – The data type of the action space. Defaults to np.float64.

  • observation_space_dtype (union[numpy.dtype, str], optional) – The data type of the observation space. Defaults to np.float64.

max_cost
_action_clip_warning = False
_clip_action
_exclude_reference_from_observation
_exclude_reference_error_from_observation
_action_space_dtype
_observation_space_dtype
_action_dtype_conversion_warning = False
t = 0.0
dt = 1.0
_init_state
_init_state_range
K1 = 1.0
K2 = 1.0
K3 = 1.0
K4 = 1.0
a1 = 1.6
a2 = 1.6
a3 = 1.6
a4 = 1.6
gamma1 = 0.16
gamma2 = 0.16
gamma3 = 0.16
gamma4 = 0.16
beta1 = 0.16
beta2 = 0.16
beta3 = 0.16
beta4 = 0.16
c1 = 0.06
c2 = 0.06
c3 = 0.06
c4 = 0.06
b1 = 5.0
b2 = 5.0
b3 = 5.0
b4 = 5.0
delta1 = 0.0
delta2 = 0.0
delta3 = 0.0
delta4 = 0.0
delta5 = 0.0
delta6 = 0.0
delta7 = 0.0
delta8 = 0.0
obs_low
obs_high
action_space
observation_space
reward_range
viewer = None
state = None
steps_beyond_done = None
reference_target_pos
reference_amplitude
reference_frequency
phase_shift
step(action)[source]

Take step into the environment.

Parameters:

action (numpy.ndarray) – The action we want to perform in the environment.

Returns:

tuple containing:

  • obs (np.ndarray): Environment observation.

  • cost (float): Cost of the action.

  • terminated (bool): Whether the episode is terminated.

  • truncated (bool): Whether the episode was truncated. This value is set by wrappers when for example a time limit is reached or the agent goes out of bounds.

  • info (dict): Additional information about the environment.

Return type:

(tuple)

reset(seed=None, options=None, random=True)[source]

Reset gymnasium environment.

Parameters:
  • seed (int, optional) – A random seed for the environment. By default None.

  • options (dict, optional) – A dictionary containing additional options for resetting the environment. By default None. Not used in this environment.

  • random (bool, optional) – Whether we want to randomly initialise the environment. By default True.

Returns:

tuple containing:

  • obs (numpy.ndarray): Initial environment observation.

  • info (dict): Dictionary containing additional information.

Return type:

(tuple)

reference(t)[source]

Returns the current value of the periodic reference signal that is tracked by the Synthetic oscillatory network.

Parameters:

t (float) – The current time step.

Returns:

The current reference value.

Return type:

float

Note

This uses the general form of a periodic signal:

\[\begin{split}y(t) = A \sin(\omega t + \phi) + C \\ y(t) = A \sin(2 \pi f t + \phi) + C \\ y(t) = A \sin(\frac{2 \pi}{T} t + \phi) + C\end{split}\]

Where:

  • \(t\) is the time.

  • \(A\) is the amplitude of the signal.

  • \(\omega\) is the frequency of the signal.

  • \(f\) is the frequency of the signal.

  • \(T\) is the period of the signal.

  • \(\phi\) is the phase of the signal.

  • \(C\) is the offset of the signal.

abstract render(mode='human')[source]

Render one frame of the environment.

Parameters:

mode (str, optional) – Gym rendering mode. The default mode will do something human friendly, such as pop up a window.

Raises:

NotImplementedError – Will throw a NotImplimented error since the render method has not yet been implemented.

Note

This currently is not yet implemented.

property tau
Alias for the environment step size. Done for compatibility with the
other gymnasium environments.
property physics_time
Returns the physics time. Alias for :attr:`.t`.