logai.applications.openset.anomaly_detection package
Subpackages
Submodules
logai.applications.openset.anomaly_detection.openset_anomaly_detection_workflow module
- class logai.applications.openset.anomaly_detection.openset_anomaly_detection_workflow.OpenSetADWorkflow(config: OpenSetADWorkflowConfig)
Bases:
object
log anomaly detection workflow for open log datasets
- Parameters:
config – (OpenSetADWorkflowConfig): config object specifying parameters for log anomaly detection over open datasets
- dedup_data(logrecord: LogRecordObject)
Method to run deduplication of log records, where loglines having same body and span id is collapsed into a single logline. The original occurrent count values of theseloglines is added as a pandas Series object in the ‘attributes’ property of the logrecord object.
- Parameters:
logrecord – (LogRecordObject): logrecord object to be deduplicated
- Returns:
LogRecordObject: resulting logrecord object
- execute()
Method to execute the end to end workflow for anomaly detection on open log datasets
- generate_train_dev_test_data(logrecord: LogRecordObject)
splitting open log datasets into train dev and test splits according to the parameters specified in the config object
- Parameters:
logrecord – (LogRecordObject): logrecord object to be split into train, dev and test
- Returns:
train_data: logrecord object containing training dataset.
dev_data: logrecord object containing dev dataset.
test_data: logrecord object containing test dataset
- load_anomaly_detector()
initialize anomaly detector object
- load_data()
loads logrecord object from raw log dataset
- Returns:
LogRecordObject : logrecord object created from the raw log dataset
- load_dataloader()
initialize dataloader object
- load_deduper()
initialize dedup object
- load_parser()
initialize log parser object
- load_partitioner()
initialize partitioner object
- load_preprocessor()
initialize preprocessor object
- Raises:
ValueError – dataset is not supported
- load_vectorizer()
initialize vectorizer object
- parse_log_data(logrecord)
parse logrecord object by applying standard log parsers as specified in the Config
- Parameters:
logrecord – (LogRecordObject): logrecord object to be parsed
- Returns:
LogRecordObject: parsed logrecord object
- partition_log_data(logrecord: LogRecordObject)
partitioning logrecord object by applying session or sliding window based partitions
- Parameters:
logrecord – (LogRecordObject): logrecord object to be partitioned
- Returns:
logrecord: partitioned logrecord object
- preprocess_log_data(logrecord)
preprocesses logrecord object by doing custom dataset specific data cleaning and formatting
- Parameters:
logrecord – (LogRecordObject): log record object to be preprocessed
- Returns:
LogRecordObject: preprocessed lgo record object using custom dataset-specific preprocessing
- run_anomaly_detection(train_data, dev_data, test_data)
Method to train and run inference of anomaly detector
- Parameters:
train_data – vectorized version of the train dataset
dev_data – vectorized version of the dev dataset
test_data – vectorized version of the test dataset
- run_data_processing_workflow()
Running data processing pipeline for log anomaly detection workflow
- Returns:
train_data: logrecord object containing training dataset.
dev_data: logrecord object containing dev dataset.
test_data: logrecord object containing test dataset
- run_vectorizer(train_logrecord, dev_logrecord, test_logrecord)
Wrapper method for applying vectorization on train, dev and test logrecord objects
- Parameters:
train_logrecord – (LogRecordObject): logrecord object of the training dataset
dev_logrecord – (LogRecordObject): logrecord object of the dev dataset
test_logrecord – (LogRecordObject): logrecord object of the test dataset
- Returns:
train_data : vectorized train data.
dev_data: vectorized dev data.
test_data: vectorized test data.
- set_anomaly_detector_configs()
setting anomaly detector model configs based on the vectorizer configs
- vectorizer_transform(logrecord: LogRecordObject, output_filename=None)
Applying vectorization on a logrecord object based on the kind of vectorizer specific in Config
- Parameters:
logrecord – (LogRecordObject): logrecord containing data to be vectorized
output_filename – (str, optional): path to output file where the vectorized log data would be dumped. Defaults to None.
- Returns:
vectorized_output : vectorized data
- class logai.applications.openset.anomaly_detection.openset_anomaly_detection_workflow.OpenSetADWorkflowConfig(data_loader_config: object | None = None, open_set_data_loader_config: object | None = None, preprocessor_config: object | None = None, log_parser_config: object | None = None, log_vectorizer_config: object | None = None, partitioner_config: object | None = None, open_set_partitioner_config: object | None = None, categorical_encoder_config: object | None = None, feature_extractor_config: object | None = None, anomaly_detection_config: object | None = None, nn_anomaly_detection_config: object | None = None, clustering_config: object | None = None, workflow_config: object | None = None, dataset_name: str | None = None, label_filepath: str | None = None, output_dir: str | None = None, parse_logline: bool = False, training_type: str | None = None, deduplicate_test: bool = False, test_data_frac_pos: float = 0.8, test_data_frac_neg: float = 0.8, train_test_shuffle: bool = False)
Bases:
WorkFlowConfig
Config for Log Anomaly Detection workflow on Open Log dataset Inherits: WorkFlowConfig: Config object for specifying workflow parameters
- Parameters:
dataset_name – str = None: name of the public open dataset
label_filepath – str = None: path to the separate file (if any) containing the anomaly detection labels
output_dir – str = None : path to output directory where all intermediate and final outputs would be dumped
parse_logline – bool = False : whether to parse or not
training_type – str = None: should be either supervised or unsupervised
deduplicate_test – bool = False : whether to de-duplicate the instances in the test data, while maintaining a count of the number of each duplicated instance
test_data_frac_pos – float = 0.8 : fraction of the logs having positive class used for teest
test_data_frac_neg – float = 0.8 : fraction of the logs having negative class used for test
train_test_shuffle – bool = False : whether to use chronological ordering of the logs or to shuffle them when creating the train test splits
- dataset_name: str
- deduplicate_test: bool
- label_filepath: str
- output_dir: str
- parse_logline: bool
- test_data_frac_neg: float
- test_data_frac_pos: float
- train_test_shuffle: bool
- training_type: str
- logai.applications.openset.anomaly_detection.openset_anomaly_detection_workflow.get_openset_ad_config(config_filename: str, anomaly_detection_type: str, vectorizer_type: str, parse_logline: bool, training_type: str)
Method to dynamically set some of the config parameters based on the given arguments. List of all possible supported anomaly detection types and vectorizer types configurations can be found in the config yaml file. Avoid this function if you are directly setting all config parameters manually
- Parameters:
config_filename – (str): Name of the config file (currently supports hdfs and bgl)
anomaly_detection_type – (str): string describing the type of anomaly detection
vectorizer_type – (str): string describing the type of vectorizer.
parse_logline – (bool): Whether to use log parsing or not
training_type – (str): Whether to use “supervised” or “unsupervised” training
- Returns:
OpenSetADWorkflowConfig: config object of type OpenSetADWorkflowConfig
- logai.applications.openset.anomaly_detection.openset_anomaly_detection_workflow.validate_config_dict(workflow_config_dict)
Method to validate the config dict with the schema
- Parameters:
workflow_config_dict – (dict): dict containing config for anomaly detection workflow on open log datasets