warp_drive.training.algorithms package¶
Submodules¶
warp_drive.training.algorithms.a2c module¶
- class warp_drive.training.algorithms.a2c.A2C(discount_factor_gamma=1.0, normalize_advantage=False, normalize_return=False, vf_loss_coeff=0.01, entropy_coeff=0.01)¶
Bases:
object
The Advantage Actor-Critic Class https://arxiv.org/abs/1602.01783
- compute_loss_and_metrics(timestep=None, actions_batch=None, rewards_batch=None, done_flags_batch=None, action_probabilities_batch=None, value_functions_batch=None, perform_logging=False)¶
warp_drive.training.algorithms.ppo module¶
- class warp_drive.training.algorithms.ppo.PPO(discount_factor_gamma=1.0, clip_param=0.1, normalize_advantage=False, normalize_return=False, vf_loss_coeff=0.01, entropy_coeff=0.01)¶
Bases:
object
The Proximal Policy Optimization Class https://arxiv.org/abs/1707.06347
- compute_loss_and_metrics(timestep=None, actions_batch=None, rewards_batch=None, done_flags_batch=None, action_probabilities_batch=None, value_functions_batch=None, perform_logging=False)¶