logai.utils package

Submodules

logai.utils.constants module

class logai.utils.constants.Field(value)

Bases: str, Enum

An enumeration.

ATTRIBUTES = 'attributes'
BODY = 'body'
LABELS = 'labels'
RESOURCE = 'resource'
SPAN_ID = 'span_id'
TIMESTAMP = 'timestamp'

logai.utils.dataset_utils module

logai.utils.dataset_utils.split_train_dev_test_for_anomaly_detection(logrecord, training_type, test_data_frac_neg_class, test_data_frac_pos_class=None, shuffle=False)

Util method to split a logrecord object into train dev and test splits, where the splitting fractions are based on the SPAN_ID field of the logrecord.

Parameters:
  • logrecord – (LogRecordObject): input logrecord object to be split into train, dev and test splits

  • training_type – (str): ‘supervised’ or ‘unsupervised’

  • test_data_frac_neg_class – (float): fraction of the negative class to be . Defaults to None.

  • test_data_frac_pos_class – (float, optional): when supervised mode is selected, fraction of the positive class data to be used for test data. (fraction for dev data is fixed).For unsupervised mode this value is fixed to 1.0

  • shuffle – (bool, optional): whether to shuffle the log data when splitting into train and test. If False, then it uses the chronological ordering, where the first (chronologically first) split will constitute train data, second one development data and third one as test data. Defaults to False.

Returns:

  • logrecord_train: logrecord object containing train data.

  • logrecord_dev: logrecord object containing dev data.

  • logrecord_test: logrecord object containing test data.

logai.utils.evaluate module

logai.utils.evaluate.get_accuracy_precision_recall(y: array, y_labels: array)

Evaluates the anomaly and labels.

Parameters:
  • y – Model inference results.

  • y_labels – y labels.

Returns:

Accuracy, precision, and recall.

logai.utils.file_utils module

logai.utils.file_utils.file_exists(path: str)

util function to check if file exists

Parameters:

path – (str): path to file

Returns:

bool: if file exists or not

logai.utils.file_utils.read_file(filepath: str)

Reads yaml, json, csv or pickle files.

Parameters:

filepath – (str): path to file

Returns:

data object containing file contents

logai.utils.functions module

Module that includes common data manipulation functions to be applied by pandas dataframes.

logai.utils.functions.get_parameter_list(row)
logai.utils.functions.pad(x, max_len: array, padding_value: int = 0)

Method to trim or pad any 1-d numpy array to a given max length with the given padding value

Parameters:
  • x – (np.array): given 1-d numpy array to be padded/trimmed

  • max_len – (int): maximum length of padded/trimmed output

  • padding_value – (int, optional): padding value. Defaults to 0.

Returns:

np.array: padded/trimmed numpy array

logai.utils.functions.pd_to_timeseries(log_features: Series)

Convert pandas.DataFrame to merlion.TimeSeries for log counter vectors.

Parameters:

log_features – log feature dataframe must only contain two columns [‘timestamp’: datetime, constants.LOGLINE_COUNTS: int].

Returns:

merlion.TimeSeries type.

logai.utils.tokenize module

Module that includes common tokenization functions to be applied by pandas dataframes.

logai.utils.tokenize.replace_delimeters(logline, delimeter_regex)

Remove customer delimeters.

logai.utils.tokenize.tokenize(logline, config)

Common tokenization of logline and using space to separate tokens.

Module contents