tianshou.data.tester¶
-
tianshou.data.tester.
test_policy_in_env
(policy, env, num_timesteps=0, num_episodes=0, discount_factor=0.99, seed=0, episode_cutoff=None)[source]¶ Tests the policy in the environment and record and prints out the performance. This is useful when the policy is trained with off-policy algorithms and thus the rewards in the data buffer does not reflect the performance of the current policy.
Parameters: - policy – A
tianshou.core.policy
. The current policy being optimized. - env – An environment.
- num_timesteps – An int specifying the number of timesteps to test the policy.
It defaults to 0 and either
num_timesteps
ornum_episodes
could be set but not both. - num_episodes – An int specifying the number of episodes to test the policy.
It defaults to 0 and either
num_timesteps
ornum_episodes
could be set but not both. - discount_factor – Optional. A float in range \([0, 1]\) defaulting to 0.99. The discount factor to compute discounted returns.
- seed – An non-negative int. The seed to seed the environment as
env.seed(seed)
. - episode_cutoff – Optional. An int. The maximum number of timesteps in one episode. This is useful when the environment has no terminal states or a single episode could be prohibitively long. If set than all episodes are forced to stop beyond this number to timesteps.
- policy – A