tianshou.core.value_function¶

Base class¶

State value¶

class tianshou.core.value_function.state_value.StateValue(network_callable, observation_placeholder, has_old_net=False)[source]¶

Bases: tianshou.core.value_function.base.ValueFunctionBase

Class for state value functions V(s). The input of the value network is states and the output of the value network is directly the V-value of the input state.

Parameters:

network_callable – A Python callable returning (action head, value head). When called it builds the tf graph and returns a Tensor of the value on the value head.
observation_placeholder – A tf.placeholder. The observation placeholder for s in V(s) in the network graph.
has_old_net – A bool defaulting to False. If true this class will create another graph with another set of tf.Variable s to be the “old net”. The “old net” could be the target networks as in DQN and DDPG, or just an old net to help optimization as in PPO.

eval_value(observation, my_feed_dict={})[source]¶

Evaluate value in minibatch using the current network.

Parameters:	observation – An array-like, of shape (batch_size,) + observation_shape. my_feed_dict – Optional. A dict defaulting to empty. Specifies placeholders such as dropout and batch_norm except observation.
Returns:	A numpy array of shape (batch_size,). The corresponding state value for each observation.

eval_value_old(observation, my_feed_dict={})[source]¶

Evaluate value in minibatch using the old net.

Parameters:	observation – An array-like, of shape (batch_size,) + observation_shape. my_feed_dict – Optional. A dict defaulting to empty. Specifies placeholders such as dropout and batch_norm except observation.
Returns:	A numpy array of shape (batch_size,). The corresponding state value for each observation.

sync_weights()[source]¶: Sync the variables of the “old net” to be the same as the current network.

trainable_variables¶: The trainable variables of the value network in a Python set. It contains only the tf.Variable s that affect the value.

value_tensor¶: Tensor of the corresponding value

Action value¶

class tianshou.core.value_function.action_value.ActionValue(network_callable, observation_placeholder, action_placeholder, has_old_net=False)[source]¶

Bases: tianshou.core.value_function.base.ValueFunctionBase

Class for action values Q(s, a). The input of the value network is states and actions and the output of the value network is directly the Q-value of the input (state, action) pairs.

Parameters:

network_callable – A Python callable returning (action head, value head). When called it builds the tf graph and returns a Tensor of the value on the value head.
observation_placeholder – A tf.placeholder. The observation placeholder for s in Q(s, a) in the network graph.
action_placeholder – A tf.placeholder. The action placeholder for a in Q(s, a) in the network graph.
has_old_net – A bool defaulting to False. If true this class will create another graph with another set of tf.Variable s to be the “old net”. The “old net” could be the target networks as in DQN and DDPG, or just an old net to help optimization as in PPO.

eval_value(observation, action, my_feed_dict={})[source]¶

Evaluate value in minibatch using the current network.

Parameters:	observation – An array-like, of shape (batch_size,) + observation_shape. action – An array-like, of shape (batch_size,) + action_shape. my_feed_dict – Optional. A dict defaulting to empty. Specifies placeholders such as dropout and batch_norm except observation and action.
Returns:	A numpy array of shape (batch_size,). The corresponding action value for each observation.

eval_value_old(observation, action, my_feed_dict={})[source]¶

Evaluate value in minibatch using the old net.

Parameters:	observation – An array-like, of shape (batch_size,) + observation_shape. action – An array-like, of shape (batch_size,) + action_shape. my_feed_dict – Optional. A dict defaulting to empty. Specifies placeholders such as dropout and batch_norm except observation and action.
Returns:	A numpy array of shape (batch_size,). The corresponding action value for each observation.

sync_weights()[source]¶: Sync the variables of the “old net” to be the same as the current network.

trainable_variables¶: The trainable variables of the value network in a Python set. It contains only the tf.Variable s that affect the value.

value_tensor¶: Tensor of the corresponding value

class tianshou.core.value_function.action_value.DQN(network_callable, observation_placeholder, has_old_net=False)[source]¶

Bases: tianshou.core.value_function.base.ValueFunctionBase

Class for the special action value function DQN. Instead of feeding s and a to the network to get a value, DQN feeds s to the network and gets at the last layer Q(s, *) for all actions under this state. Still, as ActionValue, this class still builds the Q(s, a) value Tensor. It can only be used with discrete (and finite) action spaces.

Parameters:

network_callable – A Python callable returning (action head, value head). When called it builds the tf graph and returns a Tensor of Q(s, *) on the value head.
observation_placeholder – A tf.placeholder. The observation placeholder for s in Q(s, *) in the network graph.
has_old_net – A bool defaulting to False. If true this class will create another graph with another set of tf.Variable s to be the “old net”. The “old net” could be the target networks as in DQN and DDPG, or just an old net to help optimization as in PPO.

eval_value(observation, action, my_feed_dict={})[source]¶

Evaluate value Q(s, a) in minibatch using the current network.

Parameters:	observation – An array-like, of shape (batch_size,) + observation_shape. action – An array-like, of shape (batch_size,) + action_shape. my_feed_dict – Optional. A dict defaulting to empty. Specifies placeholders such as dropout and batch_norm except observation and action.
Returns:	A numpy array of shape (batch_size,). The corresponding action value for each observation.

eval_value_all_actions(observation, my_feed_dict={})[source]¶

Evaluate values Q(s, *) in minibatch using the current network.

Parameters:	observation – An array-like, of shape (batch_size,) + observation_shape. my_feed_dict – Optional. A dict defaulting to empty. Specifies placeholders such as dropout and batch_norm except observation and action.
Returns:	A numpy array of shape (batch_size, num_actions). The corresponding action values for each observation.

eval_value_all_actions_old(observation, my_feed_dict={})[source]¶

Evaluate values Q(s, *) in minibatch using the old net.

Parameters:	observation – An array-like, of shape (batch_size,) + observation_shape. my_feed_dict – Optional. A dict defaulting to empty. Specifies placeholders such as dropout and batch_norm except observation and action.
Returns:	A numpy array of shape (batch_size, num_actions). The corresponding action values for each observation.

eval_value_old(observation, action, my_feed_dict={})[source]¶

Evaluate value Q(s, a) in minibatch using the old net.

Parameters:	observation – An array-like, of shape (batch_size,) + observation_shape. action – An array-like, of shape (batch_size,) + action_shape. my_feed_dict – Optional. A dict defaulting to empty. Specifies placeholders such as dropout and batch_norm except observation and action.
Returns:	A numpy array of shape (batch_size,). The corresponding action value for each observation.

sync_weights()[source]¶: Sync the variables of the “old net” to be the same as the current network.

trainable_variables¶: The trainable variables of the value network in a Python set. It contains only the tf.Variable s that affect the value.

value_tensor¶: Tensor of the corresponding value

value_tensor_all_actions¶: The Tensor for Q(s, *)