Benchmark

We provide scripts for evaluating and training models on task datasets. The following benchmark results are included for reference.

ALBEF

Pretraining

COCO (download)

script

Visual Genome (download)

SBU (download)

CC3M (download)

CC12M (download)

Retrieval

R1

R5

R10

Training

Evaluation

TR

COCO (download)

77.6

94.1

97.2

script

script

IR

COCO (download)

61.0

84.5

90.7

script

script

TR

Flickr30k (download)

77.6

94.1

97.2

script

script

IR

Flickr30k (download)

61.0

84.5

90.7

script

script

VQA

test-dev

test-std/test

Training

Evaluation

VQAv2 (download)

76.35

76.54

script

script

OKVQA (download)

NA

54.7

script

NA

AOKVQA (download)

54.5

NA

script

NA

Multimodal Classification

val

test

Training

Evaluation

SNLI-VE (download)

80.60

81.04

script

script

NLVR2 (download)

82.47

82.91

script

script

BLIP

Pretraining (14M)

COCO (download)

script

Visual Genome (download)

SBU (download)

CC3M (download)

CC12M (download)

Tasks

Retrieval

R1

R5

R10

Training

Evaluation

TR

COCO (download)

82.0

95.8

98.1

script

script

IR

COCO (download)

64.5

86.0

91.7

script

script

TR

Flickr30k (download)

96.9

99.9

100.0

script

script

IR

Flickr30k (download)

87.5

97.6

98.9

script

script

VQA

test-dev

test-std/test

Training

Evaluation

VQAv2 (download)

78.23

78.29

script

script

OKVQA (download)

NA

55.4

script

script

AOKVQA (download)

56.2

50.1

script

script

Image Captioning

BLEU@4

CIDEr

SPICE

Training

Evaluation

COCO (download)

39.9

133.5

23.7

script

script

NoCaps (download)

31.9

109.1

14.7

NA

script

Multimodal Classification

val

test

Training

Evaluation

NLVR2 (download)

82.48

83.25

script

script

CLIP

Tasks

Retrieval (Zero-shot)

R1

R5

R10

Evaluation

TR

COCO (download)

57.2

80.5

87.8

script

IR

COCO (download)

36.5

60.8

71.0

script

TR

Flickr30k (download)

86.5

98.0

99.1

script

IR

Flickr30k (download)

67.0

88.9

93.3

script

Multimodal Classification

val

Evaluation

ImageNet

76.5

script

ALPRO

Tasks

Retrieval

R1

R5

R10

Training

Evaluation

TR

MSRVTT (download)

33.2

60.5

71.7

script

script

VR

MSRVTT (download)

33.8

61.4

72.7

script

script

TR

DiDeMo (download)

38.8

66.4

76.8

script

script

VR

DiDeMo (download)

36.6

67.5

77.9

script

script

Video QA

test

Training

Evaluation

MSRVTT

42.1

script

script

MSVD

46.0

script

script