logai.applications.openset.anomaly_detection package



logai.applications.openset.anomaly_detection.openset_anomaly_detection_workflow module

class logai.applications.openset.anomaly_detection.openset_anomaly_detection_workflow.OpenSetADWorkflow(config: OpenSetADWorkflowConfig)

Bases: object

log anomaly detection workflow for open log datasets


config – (OpenSetADWorkflowConfig): config object specifying parameters for log anomaly detection over open datasets

dedup_data(logrecord: LogRecordObject)

Method to run deduplication of log records, where loglines having same body and span id is collapsed into a single logline. The original occurrent count values of theseloglines is added as a pandas Series object in the ‘attributes’ property of the logrecord object.


logrecord – (LogRecordObject): logrecord object to be deduplicated


LogRecordObject: resulting logrecord object


Method to execute the end to end workflow for anomaly detection on open log datasets

generate_train_dev_test_data(logrecord: LogRecordObject)

splitting open log datasets into train dev and test splits according to the parameters specified in the config object


logrecord – (LogRecordObject): logrecord object to be split into train, dev and test


  • train_data: logrecord object containing training dataset.

  • dev_data: logrecord object containing dev dataset.

  • test_data: logrecord object containing test dataset


initialize anomaly detector object


loads logrecord object from raw log dataset


LogRecordObject : logrecord object created from the raw log dataset


initialize dataloader object


initialize dedup object


initialize log parser object


initialize partitioner object


initialize preprocessor object


ValueError – dataset is not supported


initialize vectorizer object


parse logrecord object by applying standard log parsers as specified in the Config


logrecord – (LogRecordObject): logrecord object to be parsed


LogRecordObject: parsed logrecord object

partition_log_data(logrecord: LogRecordObject)

partitioning logrecord object by applying session or sliding window based partitions


logrecord – (LogRecordObject): logrecord object to be partitioned


logrecord: partitioned logrecord object


preprocesses logrecord object by doing custom dataset specific data cleaning and formatting


logrecord – (LogRecordObject): log record object to be preprocessed


LogRecordObject: preprocessed lgo record object using custom dataset-specific preprocessing

run_anomaly_detection(train_data, dev_data, test_data)

Method to train and run inference of anomaly detector

  • train_data – vectorized version of the train dataset

  • dev_data – vectorized version of the dev dataset

  • test_data – vectorized version of the test dataset


Running data processing pipeline for log anomaly detection workflow


  • train_data: logrecord object containing training dataset.

  • dev_data: logrecord object containing dev dataset.

  • test_data: logrecord object containing test dataset

run_vectorizer(train_logrecord, dev_logrecord, test_logrecord)

Wrapper method for applying vectorization on train, dev and test logrecord objects

  • train_logrecord – (LogRecordObject): logrecord object of the training dataset

  • dev_logrecord – (LogRecordObject): logrecord object of the dev dataset

  • test_logrecord – (LogRecordObject): logrecord object of the test dataset


  • train_data : vectorized train data.

  • dev_data: vectorized dev data.

  • test_data: vectorized test data.


setting anomaly detector model configs based on the vectorizer configs

vectorizer_transform(logrecord: LogRecordObject, output_filename=None)

Applying vectorization on a logrecord object based on the kind of vectorizer specific in Config

  • logrecord – (LogRecordObject): logrecord containing data to be vectorized

  • output_filename – (str, optional): path to output file where the vectorized log data would be dumped. Defaults to None.


vectorized_output : vectorized data

class logai.applications.openset.anomaly_detection.openset_anomaly_detection_workflow.OpenSetADWorkflowConfig(data_loader_config: object | None = None, open_set_data_loader_config: object | None = None, preprocessor_config: object | None = None, log_parser_config: object | None = None, log_vectorizer_config: object | None = None, partitioner_config: object | None = None, open_set_partitioner_config: object | None = None, categorical_encoder_config: object | None = None, feature_extractor_config: object | None = None, anomaly_detection_config: object | None = None, nn_anomaly_detection_config: object | None = None, clustering_config: object | None = None, workflow_config: object | None = None, dataset_name: str | None = None, label_filepath: str | None = None, output_dir: str | None = None, parse_logline: bool = False, training_type: str | None = None, deduplicate_test: bool = False, test_data_frac_pos: float = 0.8, test_data_frac_neg: float = 0.8, train_test_shuffle: bool = False)

Bases: WorkFlowConfig

Config for Log Anomaly Detection workflow on Open Log dataset Inherits: WorkFlowConfig: Config object for specifying workflow parameters

  • dataset_name – str = None: name of the public open dataset

  • label_filepath – str = None: path to the separate file (if any) containing the anomaly detection labels

  • output_dir – str = None : path to output directory where all intermediate and final outputs would be dumped

  • parse_logline – bool = False : whether to parse or not

  • training_type – str = None: should be either supervised or unsupervised

  • deduplicate_test – bool = False : whether to de-duplicate the instances in the test data, while maintaining a count of the number of each duplicated instance

  • test_data_frac_pos – float = 0.8 : fraction of the logs having positive class used for teest

  • test_data_frac_neg – float = 0.8 : fraction of the logs having negative class used for test

  • train_test_shuffle – bool = False : whether to use chronological ordering of the logs or to shuffle them when creating the train test splits

dataset_name: str
deduplicate_test: bool
label_filepath: str
output_dir: str
parse_logline: bool
test_data_frac_neg: float
test_data_frac_pos: float
train_test_shuffle: bool
training_type: str
logai.applications.openset.anomaly_detection.openset_anomaly_detection_workflow.get_openset_ad_config(config_filename: str, anomaly_detection_type: str, vectorizer_type: str, parse_logline: bool, training_type: str)

Method to dynamically set some of the config parameters based on the given arguments. List of all possible supported anomaly detection types and vectorizer types configurations can be found in the config yaml file. Avoid this function if you are directly setting all config parameters manually

  • config_filename – (str): Name of the config file (currently supports hdfs and bgl)

  • anomaly_detection_type – (str): string describing the type of anomaly detection

  • vectorizer_type – (str): string describing the type of vectorizer.

  • parse_logline – (bool): Whether to use log parsing or not

  • training_type – (str): Whether to use “supervised” or “unsupervised” training


OpenSetADWorkflowConfig: config object of type OpenSetADWorkflowConfig


Method to validate the config dict with the schema


workflow_config_dict – (dict): dict containing config for anomaly detection workflow on open log datasets

Module contents