tianshou.core.random

adapted from keras-rl

class tianshou.core.random.GaussianWhiteNoiseProcess(mu=0.0, sigma=1.0, sigma_min=None, n_steps_annealing=1000, size=1)[source]

Bases: tianshou.core.random.AnnealedGaussianProcess

Class for Gaussian white noise. At each timestep, the class samples from an exact Gaussian distribution. It allows annealing in the std of the Gaussian, but the distribution is independent at different timesteps.

Parameters:
  • mu – A float defaulting to 0. Specifying the mean of the Gaussian-like distribution.
  • sigma – A float defaulting to 1. Specifying the std of the Gaussian-like distribution.
  • sigma_min – Optional. A float. Specifying the minimum std until which the annealing stops. It defaults to None where no annealing takes place.
  • n_steps_annealing – Optional. An int. It specifies the total number of steps for which the annealing happens. Only effective when sigma_mean is not None.
  • size – An int or tuple of ints. It corresponds to the shape of the action of the environment.
sample()[source]

Draws one sample from the random process.

Returns:A numpy array. The drawn sample.
class tianshou.core.random.OrnsteinUhlenbeckProcess(theta, mu=0.0, sigma=1.0, dt=0.01, x0=None, size=1, sigma_min=None, n_steps_annealing=1000)[source]

Bases: tianshou.core.random.AnnealedGaussianProcess

Class for Ornstein-Uhlenbeck Process, as used for exploration in DDPG. Implemented based on http://math.stackexchange.com/questions/1287634/implementing-ornstein-uhlenbeck-in-matlab . It basically is a temporal-correlated Gaussian process where the distribution at the current timestep depends on the samples from the last timestep. It’s not exactly Gaussian but still resembles Gaussian.

Parameters:
  • theta – A float. A special parameter for this process.
  • mu – A float. Another parameter of this process, but it’s not exactly the mean of the distribution.
  • sigma – A float. Another parameter of this process. It acts like the std of the Gaussian-like distribution to some extent.
  • dt – A float. The time interval to simulate this process discretely, as the process is mathematically defined to be a continuous one.
  • x0 – Optional. A float. The initial value of “the samples from the last timestep” so as to draw the first sample. It defaults to zero.
  • size – An int or tuple of ints. It corresponds to the shape of the action of the environment.
  • sigma_min – Optional. A float. Specifying the minimum std until which the annealing stops. It defaults to None where no annealing takes place.
  • n_steps_annealing – An int. It specifies the total number of steps for which the annealing happens.
reset_states()[source]

Reset self.x_prev to be self.x0.

sample()[source]

Draws one sample from the random process.

Returns:A numpy array. The drawn sample.