logai.algorithms.nn_model.logbert package

Submodules

logai.algorithms.nn_model.logbert.configs module

class logai.algorithms.nn_model.logbert.configs.LogBERTConfig(pretrain_from_scratch: bool = True, model_name: str = 'bert-base-cased', model_dirname: str | None = None, mlm_probability: float = 0.15, mask_ngram: int = 1, max_token_len: int = 384, evaluation_strategy: str = 'steps', num_train_epochs: int = 20, learning_rate: float = 1e-05, logging_steps: int = 10, per_device_train_batch_size: int = 50, per_device_eval_batch_size: int = 256, eval_accumulation_steps: int = 1000, num_eval_shards: int = 10, weight_decay: float = 0.0001, save_steps: int = 50, eval_steps: int = 50, resume_from_checkpoint: bool = True, output_dir: str | None = None, tokenizer_dirpath: str | None = None)

Bases: Config

Config for logBERT model.

Parameters:

pretrain_from_scratch – bool = True : whether to do pretraining from scratch or intialize with the HuggingFace pretrained LM.
model_name – str = “bert-base-cased” : name of the model using HuggingFace standardized naming.
model_dirname – str = None : name of the directory where the model would be saved. Directory of this name would be created inside output_dir, if it does not exist.
mlm_probability – float = 0.15 : probability of the tokens to be masked during MLM trainning.
mask_ngram – int = 1 : length of ngrams that are masked during inference.
max_token_len – int = 384 : maximum token length of the input.
learning_rate – float = 1e-5 : learning rate.
weight_decay – float = 0.0001 : parameter to use weight decay of the learning rate.
per_device_train_batch_size – int = 50 : training batch size per gpu device.
per_device_eval_batch_size – int = 256 : evaluation batch size per gpu device.
eval_accumulation_steps – int = 1000 : parameter to accumulate the evaluation results over the steps.
num_eval_shards – int = 10 : parameter to shard the evaluation data (to avoid any OOM issue).
evaluation_strategy – str = “steps” : either steps or epoch, based on whether the unit of the eval_steps parameter is “steps” or “epoch”.
num_train_epochs – int = 20 : number of training epochs.
logging_steps – int = 10 : number of steps after which the output is logged.
save_steps – int = 50 : number of steps after which the model is saved.
eval_steps – int = 50 : number of steps after which evaluation is run.
resume_from_checkpoint – bool = True : whether to resume from a given model checkpoint. If set to true, it will find the latest checkpoint saved in the dir and use that to load the model.
output_dir – str = None : output directory where the model would be saved.
tokenizer_dirpath – str = None : path to directory containing the tokenizer.

eval_accumulation_steps: int

eval_steps: int

evaluation_strategy: str

learning_rate: float

logging_steps: int

mask_ngram: int

max_token_len: int

mlm_probability: float

model_dirname: str

model_name: str

num_eval_shards: int

num_train_epochs: int

output_dir: str

per_device_eval_batch_size: int

per_device_train_batch_size: int

pretrain_from_scratch: bool

resume_from_checkpoint: bool

save_steps: int

tokenizer_dirpath: str

weight_decay: float

logai.algorithms.nn_model.logbert package

Submodules

logai.algorithms.nn_model.logbert.configs module

logai.algorithms.nn_model.logbert.eval_metric_utils module

logai.algorithms.nn_model.logbert.predict module

logai.algorithms.nn_model.logbert.predict_utils module

logai.algorithms.nn_model.logbert.tokenizer_utils module

logai.algorithms.nn_model.logbert.train module

Module contents