logai.applications.openset.anomaly_detection package

Subpackages

Submodules

logai.applications.openset.anomaly_detection.openset_anomaly_detection_workflow module

class logai.applications.openset.anomaly_detection.openset_anomaly_detection_workflow.OpenSetADWorkflow(config: OpenSetADWorkflowConfig)

Bases: object

log anomaly detection workflow for open log datasets

Parameters:

config – (OpenSetADWorkflowConfig): config object specifying parameters for log anomaly detection over open datasets

dedup_data(logrecord: LogRecordObject)

Method to run deduplication of log records, where loglines having same body and span id is collapsed into a single logline. The original occurrent count values of theseloglines is added as a pandas Series object in the ‘attributes’ property of the logrecord object.

Parameters:

logrecord – (LogRecordObject): logrecord object to be deduplicated

Returns:

LogRecordObject: resulting logrecord object

execute()

Method to execute the end to end workflow for anomaly detection on open log datasets

generate_train_dev_test_data(logrecord: LogRecordObject)

splitting open log datasets into train dev and test splits according to the parameters specified in the config object

Parameters:

logrecord – (LogRecordObject): logrecord object to be split into train, dev and test

Returns:

  • train_data: logrecord object containing training dataset.

  • dev_data: logrecord object containing dev dataset.

  • test_data: logrecord object containing test dataset

load_anomaly_detector()

initialize anomaly detector object

load_data()

loads logrecord object from raw log dataset

Returns:

LogRecordObject : logrecord object created from the raw log dataset

load_dataloader()

initialize dataloader object

load_deduper()

initialize dedup object

load_parser()

initialize log parser object

load_partitioner()

initialize partitioner object

load_preprocessor()

initialize preprocessor object

Raises:

ValueError – dataset is not supported

load_vectorizer()

initialize vectorizer object

parse_log_data(logrecord)

parse logrecord object by applying standard log parsers as specified in the Config

Parameters:

logrecord – (LogRecordObject): logrecord object to be parsed

Returns:

LogRecordObject: parsed logrecord object

partition_log_data(logrecord: LogRecordObject)

partitioning logrecord object by applying session or sliding window based partitions

Parameters:

logrecord – (LogRecordObject): logrecord object to be partitioned

Returns:

logrecord: partitioned logrecord object

preprocess_log_data(logrecord)

preprocesses logrecord object by doing custom dataset specific data cleaning and formatting

Parameters:

logrecord – (LogRecordObject): log record object to be preprocessed

Returns:

LogRecordObject: preprocessed lgo record object using custom dataset-specific preprocessing

run_anomaly_detection(train_data, dev_data, test_data)

Method to train and run inference of anomaly detector

Parameters:
  • train_data – vectorized version of the train dataset

  • dev_data – vectorized version of the dev dataset

  • test_data – vectorized version of the test dataset

run_data_processing_workflow()

Running data processing pipeline for log anomaly detection workflow

Returns:

  • train_data: logrecord object containing training dataset.

  • dev_data: logrecord object containing dev dataset.

  • test_data: logrecord object containing test dataset

run_vectorizer(train_logrecord, dev_logrecord, test_logrecord)

Wrapper method for applying vectorization on train, dev and test logrecord objects

Parameters:
  • train_logrecord – (LogRecordObject): logrecord object of the training dataset

  • dev_logrecord – (LogRecordObject): logrecord object of the dev dataset

  • test_logrecord – (LogRecordObject): logrecord object of the test dataset

Returns:

  • train_data : vectorized train data.

  • dev_data: vectorized dev data.

  • test_data: vectorized test data.

set_anomaly_detector_configs()

setting anomaly detector model configs based on the vectorizer configs

vectorizer_transform(logrecord: LogRecordObject, output_filename=None)

Applying vectorization on a logrecord object based on the kind of vectorizer specific in Config

Parameters:
  • logrecord – (LogRecordObject): logrecord containing data to be vectorized

  • output_filename – (str, optional): path to output file where the vectorized log data would be dumped. Defaults to None.

Returns:

vectorized_output : vectorized data

class logai.applications.openset.anomaly_detection.openset_anomaly_detection_workflow.OpenSetADWorkflowConfig(data_loader_config: object | None = None, open_set_data_loader_config: object | None = None, preprocessor_config: object | None = None, log_parser_config: object | None = None, log_vectorizer_config: object | None = None, partitioner_config: object | None = None, open_set_partitioner_config: object | None = None, categorical_encoder_config: object | None = None, feature_extractor_config: object | None = None, anomaly_detection_config: object | None = None, nn_anomaly_detection_config: object | None = None, clustering_config: object | None = None, workflow_config: object | None = None, dataset_name: str | None = None, label_filepath: str | None = None, output_dir: str | None = None, parse_logline: bool = False, training_type: str | None = None, deduplicate_test: bool = False, test_data_frac_pos: float = 0.8, test_data_frac_neg: float = 0.8, train_test_shuffle: bool = False)

Bases: WorkFlowConfig

Config for Log Anomaly Detection workflow on Open Log dataset Inherits: WorkFlowConfig: Config object for specifying workflow parameters

Parameters:
  • dataset_name – str = None: name of the public open dataset

  • label_filepath – str = None: path to the separate file (if any) containing the anomaly detection labels

  • output_dir – str = None : path to output directory where all intermediate and final outputs would be dumped

  • parse_logline – bool = False : whether to parse or not

  • training_type – str = None: should be either supervised or unsupervised

  • deduplicate_test – bool = False : whether to de-duplicate the instances in the test data, while maintaining a count of the number of each duplicated instance

  • test_data_frac_pos – float = 0.8 : fraction of the logs having positive class used for teest

  • test_data_frac_neg – float = 0.8 : fraction of the logs having negative class used for test

  • train_test_shuffle – bool = False : whether to use chronological ordering of the logs or to shuffle them when creating the train test splits

dataset_name: str
deduplicate_test: bool
label_filepath: str
output_dir: str
parse_logline: bool
test_data_frac_neg: float
test_data_frac_pos: float
train_test_shuffle: bool
training_type: str
logai.applications.openset.anomaly_detection.openset_anomaly_detection_workflow.get_openset_ad_config(config_filename: str, anomaly_detection_type: str, vectorizer_type: str, parse_logline: bool, training_type: str)

Method to dynamically set some of the config parameters based on the given arguments. List of all possible supported anomaly detection types and vectorizer types configurations can be found in the config yaml file. Avoid this function if you are directly setting all config parameters manually

Parameters:
  • config_filename – (str): Name of the config file (currently supports hdfs and bgl)

  • anomaly_detection_type – (str): string describing the type of anomaly detection

  • vectorizer_type – (str): string describing the type of vectorizer.

  • parse_logline – (bool): Whether to use log parsing or not

  • training_type – (str): Whether to use “supervised” or “unsupervised” training

Returns:

OpenSetADWorkflowConfig: config object of type OpenSetADWorkflowConfig

logai.applications.openset.anomaly_detection.openset_anomaly_detection_workflow.validate_config_dict(workflow_config_dict)

Method to validate the config dict with the schema

Parameters:

workflow_config_dict – (dict): dict containing config for anomaly detection workflow on open log datasets

Module contents