Adding Processors ################################################ This is a tutorial on adding new processors using ``lavis.processors`` module. The LAVIS library includes a standard processor module that preprocesses data e.g. image transformation and sequence concatenation. The ``lavis.processors`` module is designed such that any processors can be added, specifically to the requirements of corresponding models of interest. In this tutorial, we will replicate the steps to add visual and textual processors specifically for `video-grounded dialogue tasks `_. In addition, we also want the processors to have processing features to make the data samples compatible with GPT-style models. Base Processor ``lavis.processors.base_processors`` ***************************************************** Note that any new processor definition should inherit the base processor class ``BaseProcessor``: .. code-block:: python from omegaconf import OmegaConf class BaseProcessor: def __init__(self): self.transform = lambda x: x return def __call__(self, item): return self.transform(item) @classmethod def from_config(cls, cfg=None): return cls() def build(self, **kwargs): cfg = OmegaConf.create(kwargs) return self.from_config(cfg) This allows us to standardize operations of processors across all processor classes while still allowing customization of processors specifically to data and model types. We encourage users not to modify the implementation of the base processor class as this will have an impact on all existing processor subclasses. GPT-style Processors ``lavis.processors.gpt_processors`` ************************************************************** In this step, we can define new processor classes, e.g. under ``lavis.processors.gpt_processors``, for GPT models designed specifically for video-grounded dialogues. First, we want to process video features by defining ``GPTVideoFeatureProcessor`` class. In this tutorial, we assume video features are extracted beforehand and this processor simply loads the features from ``npy`` files. Other methods that are specifically defined are ``padding`` (which is used by dataset instances to pad multiple video samples) and ``get_attention_mask`` (which creates an attention mask for Transformer attention in GPT models). .. code-block:: python SPECIAL_TOKENS_DICT = {'bos_token': "", 'eos_token': "", 'additional_special_tokens': ["", "", "