warp_drive.training.algorithms package

Submodules

warp_drive.training.algorithms.a2c module

class warp_drive.training.algorithms.a2c.A2C(discount_factor_gamma=1.0, normalize_advantage=False, normalize_return=False, vf_loss_coeff=0.01, entropy_coeff=0.01)

Bases: object

The Advantage Actor-Critic Class https://arxiv.org/abs/1602.01783

compute_loss_and_metrics(timestep=None, actions_batch=None, rewards_batch=None, done_flags_batch=None, action_probabilities_batch=None, value_functions_batch=None, perform_logging=False)

warp_drive.training.algorithms.ppo module

class warp_drive.training.algorithms.ppo.PPO(discount_factor_gamma=1.0, clip_param=0.1, normalize_advantage=False, normalize_return=False, vf_loss_coeff=0.01, entropy_coeff=0.01)

Bases: object

The Proximal Policy Optimization Class https://arxiv.org/abs/1707.06347

compute_loss_and_metrics(timestep=None, actions_batch=None, rewards_batch=None, done_flags_batch=None, action_probabilities_batch=None, value_functions_batch=None, perform_logging=False)

Module contents