tianshou.core.value_function¶
Base class¶
State value¶
-
class
tianshou.core.value_function.state_value.StateValue(network_callable, observation_placeholder, has_old_net=False)[source]¶ Bases:
tianshou.core.value_function.base.ValueFunctionBaseClass for state value functions V(s). The input of the value network is states and the output of the value network is directly the V-value of the input state.
Parameters: - network_callable – A Python callable returning (action head, value head). When called it builds the tf graph and returns a Tensor of the value on the value head.
- observation_placeholder – A
tf.placeholder. The observation placeholder for s in V(s) in the network graph. - has_old_net – A bool defaulting to
False. If true this class will create another graph with another set oftf.Variables to be the “old net”. The “old net” could be the target networks as in DQN and DDPG, or just an old net to help optimization as in PPO.
-
eval_value(observation, my_feed_dict={})[source]¶ Evaluate value in minibatch using the current network.
Parameters: - observation – An array-like, of shape (batch_size,) + observation_shape.
- my_feed_dict – Optional. A dict defaulting to empty. Specifies placeholders such as dropout and batch_norm except observation.
Returns: A numpy array of shape (batch_size,). The corresponding state value for each observation.
-
eval_value_old(observation, my_feed_dict={})[source]¶ Evaluate value in minibatch using the old net.
Parameters: - observation – An array-like, of shape (batch_size,) + observation_shape.
- my_feed_dict – Optional. A dict defaulting to empty. Specifies placeholders such as dropout and batch_norm except observation.
Returns: A numpy array of shape (batch_size,). The corresponding state value for each observation.
-
trainable_variables¶ The trainable variables of the value network in a Python set. It contains only the
tf.Variables that affect the value.
-
value_tensor¶ Tensor of the corresponding value
Action value¶
-
class
tianshou.core.value_function.action_value.ActionValue(network_callable, observation_placeholder, action_placeholder, has_old_net=False)[source]¶ Bases:
tianshou.core.value_function.base.ValueFunctionBaseClass for action values Q(s, a). The input of the value network is states and actions and the output of the value network is directly the Q-value of the input (state, action) pairs.
Parameters: - network_callable – A Python callable returning (action head, value head). When called it builds the tf graph and returns a Tensor of the value on the value head.
- observation_placeholder – A
tf.placeholder. The observation placeholder for s in Q(s, a) in the network graph. - action_placeholder – A
tf.placeholder. The action placeholder for a in Q(s, a) in the network graph. - has_old_net – A bool defaulting to
False. If true this class will create another graph with another set oftf.Variables to be the “old net”. The “old net” could be the target networks as in DQN and DDPG, or just an old net to help optimization as in PPO.
-
eval_value(observation, action, my_feed_dict={})[source]¶ Evaluate value in minibatch using the current network.
Parameters: - observation – An array-like, of shape (batch_size,) + observation_shape.
- action – An array-like, of shape (batch_size,) + action_shape.
- my_feed_dict – Optional. A dict defaulting to empty. Specifies placeholders such as dropout and batch_norm except observation and action.
Returns: A numpy array of shape (batch_size,). The corresponding action value for each observation.
-
eval_value_old(observation, action, my_feed_dict={})[source]¶ Evaluate value in minibatch using the old net.
Parameters: - observation – An array-like, of shape (batch_size,) + observation_shape.
- action – An array-like, of shape (batch_size,) + action_shape.
- my_feed_dict – Optional. A dict defaulting to empty. Specifies placeholders such as dropout and batch_norm except observation and action.
Returns: A numpy array of shape (batch_size,). The corresponding action value for each observation.
-
trainable_variables¶ The trainable variables of the value network in a Python set. It contains only the
tf.Variables that affect the value.
-
value_tensor¶ Tensor of the corresponding value
-
class
tianshou.core.value_function.action_value.DQN(network_callable, observation_placeholder, has_old_net=False)[source]¶ Bases:
tianshou.core.value_function.base.ValueFunctionBaseClass for the special action value function DQN. Instead of feeding s and a to the network to get a value, DQN feeds s to the network and gets at the last layer Q(s, *) for all actions under this state. Still, as
ActionValue, this class still builds the Q(s, a) value Tensor. It can only be used with discrete (and finite) action spaces.Parameters: - network_callable – A Python callable returning (action head, value head). When called it builds the tf graph and returns a Tensor of Q(s, *) on the value head.
- observation_placeholder – A
tf.placeholder. The observation placeholder for s in Q(s, *) in the network graph. - has_old_net – A bool defaulting to
False. If true this class will create another graph with another set oftf.Variables to be the “old net”. The “old net” could be the target networks as in DQN and DDPG, or just an old net to help optimization as in PPO.
-
eval_value(observation, action, my_feed_dict={})[source]¶ Evaluate value Q(s, a) in minibatch using the current network.
Parameters: - observation – An array-like, of shape (batch_size,) + observation_shape.
- action – An array-like, of shape (batch_size,) + action_shape.
- my_feed_dict – Optional. A dict defaulting to empty. Specifies placeholders such as dropout and batch_norm except observation and action.
Returns: A numpy array of shape (batch_size,). The corresponding action value for each observation.
-
eval_value_all_actions(observation, my_feed_dict={})[source]¶ Evaluate values Q(s, *) in minibatch using the current network.
Parameters: - observation – An array-like, of shape (batch_size,) + observation_shape.
- my_feed_dict – Optional. A dict defaulting to empty. Specifies placeholders such as dropout and batch_norm except observation and action.
Returns: A numpy array of shape (batch_size, num_actions). The corresponding action values for each observation.
-
eval_value_all_actions_old(observation, my_feed_dict={})[source]¶ Evaluate values Q(s, *) in minibatch using the old net.
Parameters: - observation – An array-like, of shape (batch_size,) + observation_shape.
- my_feed_dict – Optional. A dict defaulting to empty. Specifies placeholders such as dropout and batch_norm except observation and action.
Returns: A numpy array of shape (batch_size, num_actions). The corresponding action values for each observation.
-
eval_value_old(observation, action, my_feed_dict={})[source]¶ Evaluate value Q(s, a) in minibatch using the old net.
Parameters: - observation – An array-like, of shape (batch_size,) + observation_shape.
- action – An array-like, of shape (batch_size,) + action_shape.
- my_feed_dict – Optional. A dict defaulting to empty. Specifies placeholders such as dropout and batch_norm except observation and action.
Returns: A numpy array of shape (batch_size,). The corresponding action value for each observation.
-
trainable_variables¶ The trainable variables of the value network in a Python set. It contains only the
tf.Variables that affect the value.
-
value_tensor¶ Tensor of the corresponding value
-
value_tensor_all_actions¶ The Tensor for Q(s, *)