tianshou.data.data_collector¶
-
class
tianshou.data.data_collector.
DataCollector
(env, policy, data_buffer, process_functions, managed_networks)[source]¶ Bases:
object
A utility class to manage the data flow during the interaction between the policy and the environment. It stores data into
data_buffer
, processes the reward signals and returns the feed_dict for tf graph running.Parameters: - env – An environment.
- policy – A
tianshou.core.policy
. - data_buffer – A
tianshou.data.data_buffer
. - process_functions – A list of callables in
tianshou.data.advantage_estimation
to process rewards. - managed_networks – A list of networks of
tianshou.core.policy
and/ortianshou.core.value_function
. The networks you want this class to manage. This class will automatically generate the feed_dict for all the placeholders in themanaged_placeholders
of all networks in this list.
-
collect
(num_timesteps=0, num_episodes=0, my_feed_dict={}, auto_clear=True, episode_cutoff=None)[source]¶ Collect data in the environment using
self.policy
.Parameters: - num_timesteps – An int specifying the number of timesteps to act. It defaults to 0 and either
num_timesteps
ornum_episodes
could be set but not both. - num_episodes – An int specifying the number of episodes to act. It defaults to 0 and either
num_timesteps
ornum_episodes
could be set but not both. - my_feed_dict – Optional. A dict defaulting to empty. Specifies placeholders such as dropout and batch_norm except observation and action.
- auto_clear – Optional. A bool defaulting to
True
. IfTrue
then this method clears theself.data_buffer
ifself.data_buffer
is an instance oftianshou.data.data_buffer.BatchSet.
and does nothing if it’s not that instance. If set toFalse
then the aforementioned auto clearing behavior is disabled. - episode_cutoff – Optional. An int. The maximum number of timesteps in one episode. This is useful when the environment has no terminal states or a single episode could be prohibitively long. If set than all episodes are forced to stop beyond this number to timesteps.
- num_timesteps – An int specifying the number of timesteps to act. It defaults to 0 and either
-
denoise_action
(feed_dict, my_feed_dict={})[source]¶ Recompute the actions of deterministic policies without exploration noise, hence denoising. It modifies
feed_dict
in place and has no return value. This is useful in, e.g., DDPG since the stored action inself.data_buffer
is the sampled action with additional exploration noise.Parameters: - feed_dict – A dict. It has to be the dict returned by
next_batch()
by this class. - my_feed_dict – Optional. A dict defaulting to empty. Specifies placeholders such as dropout and batch_norm except observation and action.
- feed_dict – A dict. It has to be the dict returned by
-
next_batch
(batch_size, standardize_advantage=True)[source]¶ Constructs and returns the feed_dict of data to be used with
sess.run
.Parameters: - batch_size – An int. The size of one minibatch.
- standardize_advantage – Optional. A bool but defaulting to
True
. IfTrue
, then this method standardize advantages if advantage is required by the networks. IfFalse
then this method will never standardize advantage.
Returns: A dict in the format of conventional feed_dict in tf, with keys the placeholders and values the numpy arrays.