warp_drive.training.utils package¶

Subpackages¶

Submodules¶

warp_drive.training.utils.child_process_base module¶

class warp_drive.training.utils.child_process_base.DeviceContextProcessWrapper(*args, **kwargs)¶

Bases: warp_drive.training.utils.child_process_base.ProcessWrapper

assert_context_consistency()¶

run()¶: Method to be run in sub-process; can be overridden in sub-class

class warp_drive.training.utils.child_process_base.ProcessWrapper(*args, **kwargs)¶

Bases: multiprocessing.context.Process

A process wrapper to catch exceptions when they occur.

property exception¶

run()¶: Method to be run in sub-process; can be overridden in sub-class

warp_drive.training.utils.data_loader module¶

warp_drive.training.utils.data_loader.all_equal(iterable)¶: Check all elements of an iterable (e.g., list) are identical

warp_drive.training.utils.data_loader.create_and_push_data_placeholders(env_wrapper=None, policy_tag_to_agent_id_map=None, create_separate_placeholders_for_each_policy=False, obs_dim_corresponding_to_num_agents='first', training_batch_size_per_env=1, push_data_batch_placeholders=True)¶

Create observations, sampled_actions, rewards and done flags placeholders and push to the device; this is required for generating environment roll-outs as well as training. env_wrapper: the wrapped environment object. policy_tag_to_agent_id_map:

a dictionary mapping policy tag to agent ids.

create_separate_placeholders_for_each_policy:: flag indicating whether there exist separate observations, actions and rewards placeholders, for each policy, as designed in the step function. The placeholders will be used in the step() function and during training. When there’s only a single policy, this flag will be False. It can also be True when there are multiple policies, yet all the agents have the same obs and action space shapes, so we can share the same placeholder. Defaults to False.
obs_dim_corresponding_to_num_agents:: indicative of which dimension in the observation corresponds to the number of agents, as designed in the step function. It may be “first” or “last”. In other words, observations may be shaped (num_agents, *feature_dim) or (*feature_dim, num_agents). This is required in order for WarpDrive to process the observations correctly. This is only relevant when a single obs key corresponds to multiple agents. Defaults to “first”.

training_batch_size_per_env: the training batch size for each env. push_data_batch_placeholders: an optional flag to push placeholders

for the batches of actions, rewards and the done flags. Defaults to True.

warp_drive.training.utils.data_loader.get_obs(obs, agent_ids, obs_dim_corresponding_to_num_agents='first', key=None)¶

warp_drive.training.utils.param_scheduler module¶

class warp_drive.training.utils.param_scheduler.LRScheduler(schedule, optimizer=None, init_timestep=0, timesteps_per_iteration=1)¶

Bases: warp_drive.training.utils.param_scheduler.ParamScheduler, torch.optim.lr_scheduler.LambdaLR

A learning rate scheduler with Pytorch-style APIs, compatible with Pytorch Lightning.

class warp_drive.training.utils.param_scheduler.ParamScheduler(schedule)¶

Bases: object

A generic scheduler for the adapting parameters such as learning rate and entropy coefficient. Available scheduler types are [“constant”, “piecewise_linear”].

get_param_value(timestep)¶: Obtain the parameter value at a desired timestep.

warp_drive.training.utils.process_group_torch module¶

warp_drive.training.utils.process_group_torch.clear_torch_process_group()¶

warp_drive.training.utils.process_group_torch.setup_torch_process_group(device_id, num_devices, master_addr='127.0.0.1', master_port='8888', backend='gloo')¶

Setup code comes directly from the docs:

https://pytorch.org/tutorials/intermediate/ddp_tutorial.html

warp_drive.training.utils.vertical_scaler module¶

Automatic Vertical Scaling Perform a binary search to figure out the max. values of the training parameters: ‘num_envs’ and ‘train_batch_size’ to use on a GPU. The highest ‘num_envs’ is chosen that maximizes the GPU utilization (i.e., uses up as many GPU blocks possible). The highest ‘train_batch_size’ is then chosen in order to maximize the GPU memory usage. These two parameters essentially determine the largest data batch size that can be used towards training on a GPU. Note: As the num_envs is increased further and further, the GPU eventually runs out of blocks and the function run will throw a ‘cuMemFree failed: an illegal memory access was encountered` error. As the batch size is increased further and further (for a chosen num_envs), the GPU runs out of memory, and the function run will throw a CUDA out of memory error.

warp_drive.training.utils.vertical_scaler.best_param_search(low=1, margin=1, func=None)¶

Perform a binary search to determine the best parameter value. In this specific context, the best parameter is (the highest) value of the parameter (e.g. batch size) that can be used to run a func(tion) (e.g., training) successfully. Beyond a certain value, the function fails to run for reasons such as out-of-memory. param low: a starting low value to start searching from (defaults to 1). param margin: denotes the margin allowed when choosing the

configuration parameter (and the optimal parameter).

param func: the function that is required to be run with the: configuration parameter.

warp_drive.training.utils.vertical_scaler.perform_auto_vertical_scaling(setup_trainer_and_train, config, num_iters=2)¶: Auto-scale the number of envs and batch size to maximize GPU utilization. param num_iters: number of iterations to use when performing automatic vertical scaling.