tianshou.data.data_collector

class tianshou.data.data_collector.DataCollector(env, policy, data_buffer, process_functions, managed_networks)[source]

Bases: object

A utility class to manage the data flow during the interaction between the policy and the environment. It stores data into data_buffer, processes the reward signals and returns the feed_dict for tf graph running.

Parameters:
collect(num_timesteps=0, num_episodes=0, my_feed_dict={}, auto_clear=True, episode_cutoff=None)[source]

Collect data in the environment using self.policy.

Parameters:
  • num_timesteps – An int specifying the number of timesteps to act. It defaults to 0 and either num_timesteps or num_episodes could be set but not both.
  • num_episodes – An int specifying the number of episodes to act. It defaults to 0 and either num_timesteps or num_episodes could be set but not both.
  • my_feed_dict – Optional. A dict defaulting to empty. Specifies placeholders such as dropout and batch_norm except observation and action.
  • auto_clear – Optional. A bool defaulting to True. If True then this method clears the self.data_buffer if self.data_buffer is an instance of tianshou.data.data_buffer.BatchSet. and does nothing if it’s not that instance. If set to False then the aforementioned auto clearing behavior is disabled.
  • episode_cutoff – Optional. An int. The maximum number of timesteps in one episode. This is useful when the environment has no terminal states or a single episode could be prohibitively long. If set than all episodes are forced to stop beyond this number to timesteps.
denoise_action(feed_dict, my_feed_dict={})[source]

Recompute the actions of deterministic policies without exploration noise, hence denoising. It modifies feed_dict in place and has no return value. This is useful in, e.g., DDPG since the stored action in self.data_buffer is the sampled action with additional exploration noise.

Parameters:
  • feed_dict – A dict. It has to be the dict returned by next_batch() by this class.
  • my_feed_dict – Optional. A dict defaulting to empty. Specifies placeholders such as dropout and batch_norm except observation and action.
next_batch(batch_size, standardize_advantage=True)[source]

Constructs and returns the feed_dict of data to be used with sess.run.

Parameters:
  • batch_size – An int. The size of one minibatch.
  • standardize_advantage – Optional. A bool but defaulting to True. If True, then this method standardize advantages if advantage is required by the networks. If False then this method will never standardize advantage.
Returns:

A dict in the format of conventional feed_dict in tf, with keys the placeholders and values the numpy arrays.