tianshou.core.value_function

Base class

State value

class tianshou.core.value_function.state_value.StateValue(network_callable, observation_placeholder, has_old_net=False)[source]

Bases: tianshou.core.value_function.base.ValueFunctionBase

Class for state value functions V(s). The input of the value network is states and the output of the value network is directly the V-value of the input state.

Parameters:
  • network_callable – A Python callable returning (action head, value head). When called it builds the tf graph and returns a Tensor of the value on the value head.
  • observation_placeholder – A tf.placeholder. The observation placeholder for s in V(s) in the network graph.
  • has_old_net – A bool defaulting to False. If true this class will create another graph with another set of tf.Variable s to be the “old net”. The “old net” could be the target networks as in DQN and DDPG, or just an old net to help optimization as in PPO.
eval_value(observation, my_feed_dict={})[source]

Evaluate value in minibatch using the current network.

Parameters:
  • observation – An array-like, of shape (batch_size,) + observation_shape.
  • my_feed_dict – Optional. A dict defaulting to empty. Specifies placeholders such as dropout and batch_norm except observation.
Returns:

A numpy array of shape (batch_size,). The corresponding state value for each observation.

eval_value_old(observation, my_feed_dict={})[source]

Evaluate value in minibatch using the old net.

Parameters:
  • observation – An array-like, of shape (batch_size,) + observation_shape.
  • my_feed_dict – Optional. A dict defaulting to empty. Specifies placeholders such as dropout and batch_norm except observation.
Returns:

A numpy array of shape (batch_size,). The corresponding state value for each observation.

sync_weights()[source]

Sync the variables of the “old net” to be the same as the current network.

trainable_variables

The trainable variables of the value network in a Python set. It contains only the tf.Variable s that affect the value.

value_tensor

Tensor of the corresponding value

Action value

class tianshou.core.value_function.action_value.ActionValue(network_callable, observation_placeholder, action_placeholder, has_old_net=False)[source]

Bases: tianshou.core.value_function.base.ValueFunctionBase

Class for action values Q(s, a). The input of the value network is states and actions and the output of the value network is directly the Q-value of the input (state, action) pairs.

Parameters:
  • network_callable – A Python callable returning (action head, value head). When called it builds the tf graph and returns a Tensor of the value on the value head.
  • observation_placeholder – A tf.placeholder. The observation placeholder for s in Q(s, a) in the network graph.
  • action_placeholder – A tf.placeholder. The action placeholder for a in Q(s, a) in the network graph.
  • has_old_net – A bool defaulting to False. If true this class will create another graph with another set of tf.Variable s to be the “old net”. The “old net” could be the target networks as in DQN and DDPG, or just an old net to help optimization as in PPO.
eval_value(observation, action, my_feed_dict={})[source]

Evaluate value in minibatch using the current network.

Parameters:
  • observation – An array-like, of shape (batch_size,) + observation_shape.
  • action – An array-like, of shape (batch_size,) + action_shape.
  • my_feed_dict – Optional. A dict defaulting to empty. Specifies placeholders such as dropout and batch_norm except observation and action.
Returns:

A numpy array of shape (batch_size,). The corresponding action value for each observation.

eval_value_old(observation, action, my_feed_dict={})[source]

Evaluate value in minibatch using the old net.

Parameters:
  • observation – An array-like, of shape (batch_size,) + observation_shape.
  • action – An array-like, of shape (batch_size,) + action_shape.
  • my_feed_dict – Optional. A dict defaulting to empty. Specifies placeholders such as dropout and batch_norm except observation and action.
Returns:

A numpy array of shape (batch_size,). The corresponding action value for each observation.

sync_weights()[source]

Sync the variables of the “old net” to be the same as the current network.

trainable_variables

The trainable variables of the value network in a Python set. It contains only the tf.Variable s that affect the value.

value_tensor

Tensor of the corresponding value

class tianshou.core.value_function.action_value.DQN(network_callable, observation_placeholder, has_old_net=False)[source]

Bases: tianshou.core.value_function.base.ValueFunctionBase

Class for the special action value function DQN. Instead of feeding s and a to the network to get a value, DQN feeds s to the network and gets at the last layer Q(s, *) for all actions under this state. Still, as ActionValue, this class still builds the Q(s, a) value Tensor. It can only be used with discrete (and finite) action spaces.

Parameters:
  • network_callable – A Python callable returning (action head, value head). When called it builds the tf graph and returns a Tensor of Q(s, *) on the value head.
  • observation_placeholder – A tf.placeholder. The observation placeholder for s in Q(s, *) in the network graph.
  • has_old_net – A bool defaulting to False. If true this class will create another graph with another set of tf.Variable s to be the “old net”. The “old net” could be the target networks as in DQN and DDPG, or just an old net to help optimization as in PPO.
eval_value(observation, action, my_feed_dict={})[source]

Evaluate value Q(s, a) in minibatch using the current network.

Parameters:
  • observation – An array-like, of shape (batch_size,) + observation_shape.
  • action – An array-like, of shape (batch_size,) + action_shape.
  • my_feed_dict – Optional. A dict defaulting to empty. Specifies placeholders such as dropout and batch_norm except observation and action.
Returns:

A numpy array of shape (batch_size,). The corresponding action value for each observation.

eval_value_all_actions(observation, my_feed_dict={})[source]

Evaluate values Q(s, *) in minibatch using the current network.

Parameters:
  • observation – An array-like, of shape (batch_size,) + observation_shape.
  • my_feed_dict – Optional. A dict defaulting to empty. Specifies placeholders such as dropout and batch_norm except observation and action.
Returns:

A numpy array of shape (batch_size, num_actions). The corresponding action values for each observation.

eval_value_all_actions_old(observation, my_feed_dict={})[source]

Evaluate values Q(s, *) in minibatch using the old net.

Parameters:
  • observation – An array-like, of shape (batch_size,) + observation_shape.
  • my_feed_dict – Optional. A dict defaulting to empty. Specifies placeholders such as dropout and batch_norm except observation and action.
Returns:

A numpy array of shape (batch_size, num_actions). The corresponding action values for each observation.

eval_value_old(observation, action, my_feed_dict={})[source]

Evaluate value Q(s, a) in minibatch using the old net.

Parameters:
  • observation – An array-like, of shape (batch_size,) + observation_shape.
  • action – An array-like, of shape (batch_size,) + action_shape.
  • my_feed_dict – Optional. A dict defaulting to empty. Specifies placeholders such as dropout and batch_norm except observation and action.
Returns:

A numpy array of shape (batch_size,). The corresponding action value for each observation.

sync_weights()[source]

Sync the variables of the “old net” to be the same as the current network.

trainable_variables

The trainable variables of the value network in a Python set. It contains only the tf.Variable s that affect the value.

value_tensor

Tensor of the corresponding value

value_tensor_all_actions

The Tensor for Q(s, *)