tianshou.data.tester¶

tianshou.data.tester.test_policy_in_env(policy, env, num_timesteps=0, num_episodes=0, discount_factor=0.99, seed=0, episode_cutoff=None)[source]¶

Tests the policy in the environment and record and prints out the performance. This is useful when the policy is trained with off-policy algorithms and thus the rewards in the data buffer does not reflect the performance of the current policy.

Parameters:

policy – A tianshou.core.policy. The current policy being optimized.
env – An environment.
num_timesteps – An int specifying the number of timesteps to test the policy. It defaults to 0 and either num_timesteps or num_episodes could be set but not both.
num_episodes – An int specifying the number of episodes to test the policy. It defaults to 0 and either num_timesteps or num_episodes could be set but not both.
discount_factor – Optional. A float in range \([0, 1]\) defaulting to 0.99. The discount factor to compute discounted returns.
seed – An non-negative int. The seed to seed the environment as env.seed(seed).
episode_cutoff – Optional. An int. The maximum number of timesteps in one episode. This is useful when the environment has no terminal states or a single episode could be prohibitively long. If set than all episodes are forced to stop beyond this number to timesteps.