tianshou.data.data_collector¶
-
class
tianshou.data.data_collector.DataCollector(env, policy, data_buffer, process_functions, managed_networks)[source]¶ Bases:
objectA utility class to manage the data flow during the interaction between the policy and the environment. It stores data into
data_buffer, processes the reward signals and returns the feed_dict for tf graph running.Parameters: - env – An environment.
- policy – A
tianshou.core.policy. - data_buffer – A
tianshou.data.data_buffer. - process_functions – A list of callables in
tianshou.data.advantage_estimationto process rewards. - managed_networks – A list of networks of
tianshou.core.policyand/ortianshou.core.value_function. The networks you want this class to manage. This class will automatically generate the feed_dict for all the placeholders in themanaged_placeholdersof all networks in this list.
-
collect(num_timesteps=0, num_episodes=0, my_feed_dict={}, auto_clear=True, episode_cutoff=None)[source]¶ Collect data in the environment using
self.policy.Parameters: - num_timesteps – An int specifying the number of timesteps to act. It defaults to 0 and either
num_timestepsornum_episodescould be set but not both. - num_episodes – An int specifying the number of episodes to act. It defaults to 0 and either
num_timestepsornum_episodescould be set but not both. - my_feed_dict – Optional. A dict defaulting to empty. Specifies placeholders such as dropout and batch_norm except observation and action.
- auto_clear – Optional. A bool defaulting to
True. IfTruethen this method clears theself.data_bufferifself.data_bufferis an instance oftianshou.data.data_buffer.BatchSet.and does nothing if it’s not that instance. If set toFalsethen the aforementioned auto clearing behavior is disabled. - episode_cutoff – Optional. An int. The maximum number of timesteps in one episode. This is useful when the environment has no terminal states or a single episode could be prohibitively long. If set than all episodes are forced to stop beyond this number to timesteps.
- num_timesteps – An int specifying the number of timesteps to act. It defaults to 0 and either
-
denoise_action(feed_dict, my_feed_dict={})[source]¶ Recompute the actions of deterministic policies without exploration noise, hence denoising. It modifies
feed_dictin place and has no return value. This is useful in, e.g., DDPG since the stored action inself.data_bufferis the sampled action with additional exploration noise.Parameters: - feed_dict – A dict. It has to be the dict returned by
next_batch()by this class. - my_feed_dict – Optional. A dict defaulting to empty. Specifies placeholders such as dropout and batch_norm except observation and action.
- feed_dict – A dict. It has to be the dict returned by
-
next_batch(batch_size, standardize_advantage=True)[source]¶ Constructs and returns the feed_dict of data to be used with
sess.run.Parameters: - batch_size – An int. The size of one minibatch.
- standardize_advantage – Optional. A bool but defaulting to
True. IfTrue, then this method standardize advantages if advantage is required by the networks. IfFalsethen this method will never standardize advantage.
Returns: A dict in the format of conventional feed_dict in tf, with keys the placeholders and values the numpy arrays.