tianshou.core.value_function¶
Base class¶
State value¶
-
class
tianshou.core.value_function.state_value.
StateValue
(network_callable, observation_placeholder, has_old_net=False)[source]¶ Bases:
tianshou.core.value_function.base.ValueFunctionBase
Class for state value functions V(s). The input of the value network is states and the output of the value network is directly the V-value of the input state.
Parameters: - network_callable – A Python callable returning (action head, value head). When called it builds the tf graph and returns a Tensor of the value on the value head.
- observation_placeholder – A
tf.placeholder
. The observation placeholder for s in V(s) in the network graph. - has_old_net – A bool defaulting to
False
. If true this class will create another graph with another set oftf.Variable
s to be the “old net”. The “old net” could be the target networks as in DQN and DDPG, or just an old net to help optimization as in PPO.
-
eval_value
(observation, my_feed_dict={})[source]¶ Evaluate value in minibatch using the current network.
Parameters: - observation – An array-like, of shape (batch_size,) + observation_shape.
- my_feed_dict – Optional. A dict defaulting to empty. Specifies placeholders such as dropout and batch_norm except observation.
Returns: A numpy array of shape (batch_size,). The corresponding state value for each observation.
-
eval_value_old
(observation, my_feed_dict={})[source]¶ Evaluate value in minibatch using the old net.
Parameters: - observation – An array-like, of shape (batch_size,) + observation_shape.
- my_feed_dict – Optional. A dict defaulting to empty. Specifies placeholders such as dropout and batch_norm except observation.
Returns: A numpy array of shape (batch_size,). The corresponding state value for each observation.
-
trainable_variables
¶ The trainable variables of the value network in a Python set. It contains only the
tf.Variable
s that affect the value.
-
value_tensor
¶ Tensor of the corresponding value
Action value¶
-
class
tianshou.core.value_function.action_value.
ActionValue
(network_callable, observation_placeholder, action_placeholder, has_old_net=False)[source]¶ Bases:
tianshou.core.value_function.base.ValueFunctionBase
Class for action values Q(s, a). The input of the value network is states and actions and the output of the value network is directly the Q-value of the input (state, action) pairs.
Parameters: - network_callable – A Python callable returning (action head, value head). When called it builds the tf graph and returns a Tensor of the value on the value head.
- observation_placeholder – A
tf.placeholder
. The observation placeholder for s in Q(s, a) in the network graph. - action_placeholder – A
tf.placeholder
. The action placeholder for a in Q(s, a) in the network graph. - has_old_net – A bool defaulting to
False
. If true this class will create another graph with another set oftf.Variable
s to be the “old net”. The “old net” could be the target networks as in DQN and DDPG, or just an old net to help optimization as in PPO.
-
eval_value
(observation, action, my_feed_dict={})[source]¶ Evaluate value in minibatch using the current network.
Parameters: - observation – An array-like, of shape (batch_size,) + observation_shape.
- action – An array-like, of shape (batch_size,) + action_shape.
- my_feed_dict – Optional. A dict defaulting to empty. Specifies placeholders such as dropout and batch_norm except observation and action.
Returns: A numpy array of shape (batch_size,). The corresponding action value for each observation.
-
eval_value_old
(observation, action, my_feed_dict={})[source]¶ Evaluate value in minibatch using the old net.
Parameters: - observation – An array-like, of shape (batch_size,) + observation_shape.
- action – An array-like, of shape (batch_size,) + action_shape.
- my_feed_dict – Optional. A dict defaulting to empty. Specifies placeholders such as dropout and batch_norm except observation and action.
Returns: A numpy array of shape (batch_size,). The corresponding action value for each observation.
-
trainable_variables
¶ The trainable variables of the value network in a Python set. It contains only the
tf.Variable
s that affect the value.
-
value_tensor
¶ Tensor of the corresponding value
-
class
tianshou.core.value_function.action_value.
DQN
(network_callable, observation_placeholder, has_old_net=False)[source]¶ Bases:
tianshou.core.value_function.base.ValueFunctionBase
Class for the special action value function DQN. Instead of feeding s and a to the network to get a value, DQN feeds s to the network and gets at the last layer Q(s, *) for all actions under this state. Still, as
ActionValue
, this class still builds the Q(s, a) value Tensor. It can only be used with discrete (and finite) action spaces.Parameters: - network_callable – A Python callable returning (action head, value head). When called it builds the tf graph and returns a Tensor of Q(s, *) on the value head.
- observation_placeholder – A
tf.placeholder
. The observation placeholder for s in Q(s, *) in the network graph. - has_old_net – A bool defaulting to
False
. If true this class will create another graph with another set oftf.Variable
s to be the “old net”. The “old net” could be the target networks as in DQN and DDPG, or just an old net to help optimization as in PPO.
-
eval_value
(observation, action, my_feed_dict={})[source]¶ Evaluate value Q(s, a) in minibatch using the current network.
Parameters: - observation – An array-like, of shape (batch_size,) + observation_shape.
- action – An array-like, of shape (batch_size,) + action_shape.
- my_feed_dict – Optional. A dict defaulting to empty. Specifies placeholders such as dropout and batch_norm except observation and action.
Returns: A numpy array of shape (batch_size,). The corresponding action value for each observation.
-
eval_value_all_actions
(observation, my_feed_dict={})[source]¶ Evaluate values Q(s, *) in minibatch using the current network.
Parameters: - observation – An array-like, of shape (batch_size,) + observation_shape.
- my_feed_dict – Optional. A dict defaulting to empty. Specifies placeholders such as dropout and batch_norm except observation and action.
Returns: A numpy array of shape (batch_size, num_actions). The corresponding action values for each observation.
-
eval_value_all_actions_old
(observation, my_feed_dict={})[source]¶ Evaluate values Q(s, *) in minibatch using the old net.
Parameters: - observation – An array-like, of shape (batch_size,) + observation_shape.
- my_feed_dict – Optional. A dict defaulting to empty. Specifies placeholders such as dropout and batch_norm except observation and action.
Returns: A numpy array of shape (batch_size, num_actions). The corresponding action values for each observation.
-
eval_value_old
(observation, action, my_feed_dict={})[source]¶ Evaluate value Q(s, a) in minibatch using the old net.
Parameters: - observation – An array-like, of shape (batch_size,) + observation_shape.
- action – An array-like, of shape (batch_size,) + action_shape.
- my_feed_dict – Optional. A dict defaulting to empty. Specifies placeholders such as dropout and batch_norm except observation and action.
Returns: A numpy array of shape (batch_size,). The corresponding action value for each observation.
-
trainable_variables
¶ The trainable variables of the value network in a Python set. It contains only the
tf.Variable
s that affect the value.
-
value_tensor
¶ Tensor of the corresponding value
-
value_tensor_all_actions
¶ The Tensor for Q(s, *)