stable_gym.envs.biological

Stable Gym gymnasium environments that are based on Biological systems.

Subpackages

Classes

`Oscillator`	Synthetic oscillatory network environment.
`OscillatorComplicated`	Challenging (i.e. complicated) oscillatory network environment. This environment

Package Contents

class stable_gym.envs.biological.Oscillator(render_mode=None, max_cost=100.0, reference_target_position=8.0, reference_amplitude=7.0, reference_frequency=1 / 200, reference_phase_shift=0.0, clip_action=True, exclude_reference_from_observation=False, exclude_reference_error_from_observation=False, action_space_dtype=np.float64, observation_space_dtype=np.float64)[source]

Bases: gymnasium.Env

Synthetic oscillatory network environment.

Note

Can also be used in a vectorized manner. See the gym.vector documentation.

Description:

The goal of the agent in the oscillator environment is to act in such a way that one of the proteins of the synthetic oscillatory network follows a supplied reference signal.

Source:

This environment corresponds to the Oscillator environment used in the paper Han et al. 2020. In our implementation several additional features were added to the environment to make it more flexible and easier to use:

Environment arguments now allow for modification of the reference signal parameters.

System parameters can now be individually adjusted for each protein, rather than applying the same parameters across all proteins.

The reference can be omitted from the observation.

Reference error can be included in the info dictionary.

The observation space was expanded to accurately reproduce the plots presented in Han et al. 2020, which was not possible with the original code’s observation space.

Added an adjustable max_cost threshold for episode termination, defaulting to 100 to match the original environment.

Observation:

Type: Box(7) or Box(8) depending on the exclude_reference_error_from_observation argument.

Num	Observation	Min	Max
0	Lacl mRNA transcripts concentration	0	\(\infty\)
1	tetR mRNA transcripts concentration	0	\(\infty\)
2	CI mRNA transcripts concentration	0	\(\infty\)
3	lacI (repressor) protein concentration (Inhibits transcription of the tetR gene)	0	\(\infty\)
4	tetR (repressor) protein concentration (Inhibits transcription of CI gene)	0	\(\infty\)
5	CI (repressor) protein concentration (Inhibits transcription of lacI gene)	0	\(\infty\)
6	The reference we want to follow	0	\(\infty\)
	Optional - The error between the current value of protein 1 and the reference	\(-\infty\)	\(\infty\)

Actions:

Type: Box(3)

Num	Action	Max
0	Relative intensity of light signal that induce the expression of the Lacl mRNA gene.	1
1	Relative intensity of light signal that induce the expression of the tetR mRNA gene.	1
2	Relative intensity of light signal that induce the expression of the CI mRNA gene.	1

Cost:

A cost, computed as the sum of the squared differences between the estimated and the actual states:

\[C = {p_1 - r_1}^2\]

Starting State:

All observations are assigned a uniform random value in [0..5]

Episode Termination:

An episode is terminated when the maximum step limit is reached.
The step exceeds a threshold (default is 100). This threshold can be adjusted using the max_cost environment argument.

Solved Requirements:

Considered solved when the average cost is lower than 300.

How to use:

import stable_gym
import gymnasium as gym
env = gym.make("stable_gym:Oscillator-v1")

On reset, the options parameter allows the user to change the bounds used to determine the new random state when random=True.

state

The current system state.

Type:: numpy.ndarray

t

The current time step.

Type:: float

dt

The environment step size. Also available as tau.

Type:: float

sigma

The variance of the system noise.

Type:: float

max_cost

The maximum cost allowed before the episode is terminated.

Type:: float

Initialise a new Oscillator environment instance.

Parameters:

render_mode (str, optional) – The render mode you want to use. Defaults to None. Not used in this environment.
max_cost (float, optional) – The maximum cost allowed before the episode is terminated. Defaults to 100.0.
reference_target_position – The reference target position, by default 8.0 (i.e. the mean of the reference signal).
reference_amplitude – The reference amplitude, by default 7.0.
reference_frequency – The reference frequency, by default 0.005.
reference_phase_shift – The reference phase shift, by default 0.0.
clip_action (str, optional) – Whether the actions should be clipped if they are greater than the set action limit. Defaults to True.
exclude_reference_from_observation (bool, optional) – Whether the reference should be excluded from the observation. Defaults to False.
exclude_reference_error_from_observation (bool, optional) – Whether the error should be excluded from the observation. Defaults to False.
action_space_dtype (union[numpy.dtype, str], optional) – The data type of the action space. Defaults to np.float64.
observation_space_dtype (union[numpy.dtype, str], optional) – The data type of the observation space. Defaults to np.float64.

max_cost

_action_clip_warning = False

_clip_action

_exclude_reference_from_observation

_exclude_reference_error_from_observation

_action_space_dtype

_observation_space_dtype

_action_dtype_conversion_warning = False

t = 0.0

dt = 1.0

_init_state

_init_state_range

K1 = 1.0

K2 = 1.0

K3 = 1.0

a1 = 1.6

a2 = 1.6

a3 = 1.6

gamma1 = 0.16

gamma2 = 0.16

gamma3 = 0.16

beta1 = 0.16

beta2 = 0.16

beta3 = 0.16

c1 = 0.06

c2 = 0.06

c3 = 0.06

b1 = 5.0

b2 = 5.0

b3 = 5.0

delta1 = 0.0

delta2 = 0.0

delta3 = 0.0

delta4 = 0.0

delta5 = 0.0

delta6 = 0.0

obs_low

obs_high

action_space

observation_space

reward_range

viewer = None

state = None

steps_beyond_done = None

reference_target_pos

reference_amplitude

reference_frequency

phase_shift

step(action)[source]

Take step into the environment.

Parameters:

action (numpy.ndarray) – The action we want to perform in the environment.

Returns:

tuple containing:

obs (np.ndarray): Environment observation.

cost (float): Cost of the action.

terminated (bool): Whether the episode is terminated.

truncated (bool): Whether the episode was truncated. This value is set by wrappers when for example a time limit is reached or the agent goes out of bounds.

info (dict): Additional information about the environment.

Return type:

(tuple)

reset(seed=None, options=None, random=True)[source]

Reset gymnasium environment.

Parameters:

seed (int, optional) – A random seed for the environment. By default None.
options (dict, optional) – A dictionary containing additional options for resetting the environment. By default None. Not used in this environment.
random (bool, optional) – Whether we want to randomly initialise the environment. By default True.

Returns:

tuple containing:

obs (numpy.ndarray): Initial environment observation.

info (dict): Dictionary containing additional information.

Return type:

(tuple)

reference(t)[source]

Returns the current value of the periodic reference signal that is tracked by the Synthetic oscillatory network.

Parameters:: t (float) – The current time step.
Returns:: The current reference value.
Return type:: float

Note

This uses the general form of a periodic signal:

\[\begin{split}y(t) = A \sin(\omega t + \phi) + C \\ y(t) = A \sin(2 \pi f t + \phi) + C \\ y(t) = A \sin(\frac{2 \pi}{T} t + \phi) + C\end{split}\]

Where:

\(t\) is the time.
\(A\) is the amplitude of the signal.
\(\omega\) is the frequency of the signal.
\(f\) is the frequency of the signal.
\(T\) is the period of the signal.
\(\phi\) is the phase of the signal.
\(C\) is the offset of the signal.

abstract render(mode='human')[source]

Render one frame of the environment.

Parameters:: mode (str, optional) – Gym rendering mode. The default mode will do something human friendly, such as pop up a window.
Raises:: NotImplementedError – Will throw a NotImplimented error since the render method has not yet been implemented.

Note

This currently is not yet implemented.

property tau
Alias for the environment step size. Done for compatibility with the
other gymnasium environments.

property physics_time
Returns the physics time. Alias for :attr:`.t`.

class stable_gym.envs.biological.OscillatorComplicated(render_mode=None, max_cost=np.inf, reference_target_position=8.0, reference_amplitude=7.0, reference_frequency=1 / 200, reference_phase_shift=0.0, clip_action=True, exclude_reference_from_observation=False, exclude_reference_error_from_observation=False, action_space_dtype=np.float64, observation_space_dtype=np.float64)[source]

Bases: gymnasium.Env

Challenging (i.e. complicated) oscillatory network environment. This environment class is based on the Oscillator environment class but has an additional protein, mRNA transcription and light input.

Note

Can also be used in a vectorized manner. See the gym.vector documentation.

Description:

The goal of the agent in the oscillator environment is to act in such a way that one of the proteins of the synthetic oscillatory network follows a supplied reference signal.

Source:

This environment corresponds to the Oscillator environment used in the paper Han et al. 2020. In our implementation several additional features were added to the environment to make it more flexible and easier to use:

Environment arguments now allow for modification of the reference signal parameters.

System parameters can now be individually adjusted for each protein, rather than applying the same parameters across all proteins.

The reference can be omitted from the observation.

Reference error can be included in the info dictionary.

The observation space was expanded to accurately reproduce the plots presented in Han et al. 2020, which was not possible with the original code’s observation space.

Added an adjustable max_cost threshold for episode termination, defaulting to \(\infty\) to match the original environment.

Observation:

Type: Box(9) or Box(10) depending on the exclude_reference_error_from_observation argument.

Num	Observation	Min	Max
0	Lacl mRNA transcripts concentration	0	\(\infty\)
1	tetR mRNA transcripts concentration	0	\(\infty\)
2	CI mRNA transcripts concentration	0	\(\infty\)
3	Extra protein mRNA transcripts concentration	0	\(\infty\)
4	lacI (repressor) protein concentration (Inhibits transcription of the tetR gene)	0	\(\infty\)
5	tetR (repressor) protein concentration (Inhibits transcription of CI gene)	0	\(\infty\)
6	CI (repressor) protein concentration (Inhibits transcription of extra protein gene)	0	\(\infty\)
7	Extra (repressor) protein concentration (Inhibits transcription of lacI gene)	0	\(\infty\)
8	The reference we want to follow	0	\(\infty\)
	Optional - The error between the current value of protein 1 and the reference	-\(\infty\)	\(\infty\)

Actions:

Type: Box(3)

Num	Action	Max
0	Relative intensity of light signal that induce the expression of the Lacl mRNA gene.	1
1	Relative intensity of light signal that induce the expression of the tetR mRNA gene.	1
2	Relative intensity of light signal that induce the expression of the CI mRNA gene.	1
3	Relative intensity of light signal that induce the expression of the extra protein mRNA gene.	1

Cost:

A cost, computed as the sum of the squared differences between the estimated and the actual states:

\[C = {p_1 - r_1}^2\]

Starting State:

All observations are assigned a uniform random value in [0..5]

Episode Termination:

An episode is terminated when the maximum step limit is reached.
The step exceeds a threshold (default is \(\infty\)). This threshold can be adjusted using the max_cost environment argument.

Solved Requirements:

Considered solved when the average cost is lower than 300.

How to use:

import stable_gym
import gymnasium as gym
env = gym.make("stable_gym:OscillatorComplicated-v1")

On reset, the options parameter allows the user to change the bounds used to determine the new random state when random=True.

state

The current system state.

Type:: numpy.ndarray

t

The current time step.

Type:: float

dt

The environment step size. Also available as tau.

Type:: float

sigma

The variance of the system noise.

Type:: float

max_cost

The maximum cost allowed before the episode is terminated.

Type:: float

Initialise a new OscillatorComplicated environment instance.

Parameters:

render_mode (str, optional) – The render mode you want to use. Defaults to None. Not used in this environment.
max_cost (float, optional) – The maximum cost allowed before the episode is terminated. Defaults to np.inf.
reference_target_position – The reference target position, by default 8.0 (i.e. the mean of the reference signal).
reference_amplitude – The reference amplitude, by default 7.0.
reference_frequency – The reference frequency, by default 0.005.
reference_phase_shift – The reference phase shift, by default 0.0.
clip_action (str, optional) – Whether the actions should be clipped if they are greater than the set action limit. Defaults to True.
exclude_reference_from_observation (bool, optional) – Whether the reference should be excluded from the observation. Defaults to False.
exclude_reference_error_from_observation (bool, optional) – Whether the error should be excluded from the observation. Defaults to False.
action_space_dtype (union[numpy.dtype, str], optional) – The data type of the action space. Defaults to np.float64.
observation_space_dtype (union[numpy.dtype, str], optional) – The data type of the observation space. Defaults to np.float64.

max_cost

_action_clip_warning = False

_clip_action

_exclude_reference_from_observation

_exclude_reference_error_from_observation

_action_space_dtype

_observation_space_dtype

_action_dtype_conversion_warning = False

t = 0.0

dt = 1.0

_init_state

_init_state_range

K1 = 1.0

K2 = 1.0

K3 = 1.0

K4 = 1.0

a1 = 1.6

a2 = 1.6

a3 = 1.6

a4 = 1.6

gamma1 = 0.16

gamma2 = 0.16

gamma3 = 0.16

gamma4 = 0.16

beta1 = 0.16

beta2 = 0.16

beta3 = 0.16

beta4 = 0.16

c1 = 0.06

c2 = 0.06

c3 = 0.06

c4 = 0.06

b1 = 5.0

b2 = 5.0

b3 = 5.0

b4 = 5.0

delta1 = 0.0

delta2 = 0.0

delta3 = 0.0

delta4 = 0.0

delta5 = 0.0

delta6 = 0.0

delta7 = 0.0

delta8 = 0.0

obs_low

obs_high

action_space

observation_space

reward_range

viewer = None

state = None

steps_beyond_done = None

reference_target_pos

reference_amplitude

reference_frequency

phase_shift

step(action)[source]

Take step into the environment.

Parameters:

action (numpy.ndarray) – The action we want to perform in the environment.

Returns:

tuple containing:

obs (np.ndarray): Environment observation.

cost (float): Cost of the action.

terminated (bool): Whether the episode is terminated.

truncated (bool): Whether the episode was truncated. This value is set by wrappers when for example a time limit is reached or the agent goes out of bounds.

info (dict): Additional information about the environment.

Return type:

(tuple)

reset(seed=None, options=None, random=True)[source]

Reset gymnasium environment.

Parameters:

seed (int, optional) – A random seed for the environment. By default None.
options (dict, optional) – A dictionary containing additional options for resetting the environment. By default None. Not used in this environment.
random (bool, optional) – Whether we want to randomly initialise the environment. By default True.

Returns:

tuple containing:

obs (numpy.ndarray): Initial environment observation.

info (dict): Dictionary containing additional information.

Return type:

(tuple)

reference(t)[source]

Returns the current value of the periodic reference signal that is tracked by the Synthetic oscillatory network.

Parameters:: t (float) – The current time step.
Returns:: The current reference value.
Return type:: float

Note

This uses the general form of a periodic signal:

\[\begin{split}y(t) = A \sin(\omega t + \phi) + C \\ y(t) = A \sin(2 \pi f t + \phi) + C \\ y(t) = A \sin(\frac{2 \pi}{T} t + \phi) + C\end{split}\]

Where:

\(t\) is the time.
\(A\) is the amplitude of the signal.
\(\omega\) is the frequency of the signal.
\(f\) is the frequency of the signal.
\(T\) is the period of the signal.
\(\phi\) is the phase of the signal.
\(C\) is the offset of the signal.

abstract render(mode='human')[source]

Render one frame of the environment.

Parameters:: mode (str, optional) – Gym rendering mode. The default mode will do something human friendly, such as pop up a window.
Raises:: NotImplementedError – Will throw a NotImplimented error since the render method has not yet been implemented.

Note

This currently is not yet implemented.

property tau
Alias for the environment step size. Done for compatibility with the
other gymnasium environments.

property physics_time
Returns the physics time. Alias for :attr:`.t`.