Training Models on Task Datasets (Commands and Configurations)
LAVIS provides scripts to pre-train and finetune supported models on standard language-vision tasks, stored at lavis/run_scripts/
.
To replicate the experiments, just run these bash scripts. For example, to train BLIP model on the image-text retrieval task with MSCOCO dataset, we can run
bash run_scripts/blip/train/train_retrieval_coco.sh
Inside the scripts, we can see
python -m torch.distributed.run --nproc_per_node=8 train.py --cfg-path lavis/projects/blip/train/retrieval_coco_ft.yaml
where we start a pytorch distributed training on 8 GPUs (you may change according to your own hardware setup). The --cfg-path
specifys a runtime configuration file, specifying
the task, model, dataset and training recipes.
Available options and their descriptions are as below.
Model Configurations |
Functionalities |
---|---|
arch |
name of the model from the model zoo
default: task-dependent
|
model_type |
the type of the model (e.g., base)
default: task-dependent
|
load_pretrained |
load pretrained weights
default: True (for finetuning task) | False (for pretraining task)
|
load_finetuned |
load task-specific finetuned weights
default: False (for finetuning task) | True (for evaluation)
|
pretrained |
URL or local path which stores the pretrained model, defined in the default model configuration file
default: task-dependent
|
finetuned |
URL or local path which stores the finetuned model, defined in the default model configuration file
default: task-dependent
|
Dataset Configurations |
Functionalities |
---|---|
vis_processor |
pre-processing of visual input
default: task-dependent
|
text_processor |
pre-processing of text input
default: task-dependent
|
build_info |
dataset information including the storage location, defined in the default dataset configuration file
default: task-dependent
|
Runtime Configurations |
Functionalities |
---|---|
task |
name of the task
default: task-dependent
|
lr_sched |
learning rate schedular
default: linear_warmup_cosine_lr
|
init_lr |
initial learning rate (after warmup)
default: task-dependent
|
min_lr |
final learning rate after decay
default: task-dependent
|
warmup_lr |
starting learning rate for warmup
default: init_lr (no warmup)
|
lr_decay_rate |
learning rate decay per epoch for step_lr_shedule
default: 0.9
|
warmup_steps |
number of steps for learning rate warmup
default: 0
|
max_epoch |
total number of training epochs
default: task-dependent
|
weight_decay |
weight decay coefficient for the optimizer
default: 0.05
|
batch_size_train |
batch size during training
default: task-dependent
|
batch_size_eval |
batch size during evaluation
default: task-dependent
|
seed |
pseudo random number generator seed
default: 42
|
output_dir |
directory to store logs, results and checkpoints
default: task-dependent
|
resume_ckpt_path |
path of the checkpoint to resume training from
default: None
|
evaluate |
only perform evaluation without training
default: False
|
train_splits |
dataset splits used for training
default: [“train”]
|
valid_splits |
dataset splits used for validation
default: [“val”]
|
test |
dataset splits used for test
default: [“test”]
|
device |
use cpu or gpu (cuda)
default: cuda
|
world_size |
number of processes participating in the job
default: 1
|
dist_url |
URL specifying how to initialize the process group
default: “env://”
|
distributed |
use distributed training
default: True
|
amp |
use automatic mixed precision training
default: False
|
Text Generation Configurations |
Functionalities |
---|---|
max_len |
maximum number of text tokens to generate
default: 20 (for image captioning)
|
min_len |
minimum number of text tokens to generate
default: 5 (for image captioning)
|
num_beams |
number of beams to perform beam search
default: 3
|
Multimodal Retrieval Configurations |
Functionalities |
---|---|
negative_all_rank |
collect negatives from all processes for the image-text matching loss
default: True (for coco)
|
k_test |
number of retrieval candidates ranked from contrastive similarity
default: 256 (for coco)
|