logai.information_extraction package

Submodules

logai.information_extraction.categorical_encoder module

class logai.information_extraction.categorical_encoder.CategoricalEncoder(config: CategoricalEncoderConfig)

Bases: object

Implementation of the categorical encoder.

fit_transform(features: Series) Tuple[DataFrame, list]

Transforms the str features into categories.

Parameters:

features – A list of features.

Returns:

A list of encoded features.

class logai.information_extraction.categorical_encoder.CategoricalEncoderConfig(name: str = 'label_encoder', params: object | None = None)

Bases: Config

Categorical encoding configurations.

classmethod from_dict(config_dict)

Loads a config from a config dict.

Parameters:

config_dict – The config parameters in a dict.

name: str
params: object

logai.information_extraction.feature_extractor module

class logai.information_extraction.feature_extractor.FeatureExtractor(config: FeatureExtractorConfig)

Bases: object

Feature Extractor combines the structured log attributes and log vectors. Feature extractor can group log records to log events based on user defined strategies. Such as group by a categorical column, or group by timestamps. Generate feature sets: 1. log features: generate feature set from log vectors. 2. log event sequence: concatenating all loglines belongs to the same log event. 3. log event counter vector: for each log event. 4. log vector .. note:

1. counter vector
2. sematic vector
3. id sequence vector.

partitioning:
1. group by attributes
2. group by length of sequence, either sliding windows or fix window.
3. timestamp interval based.
convert_to_counter_vector(log_pattern: Series | None = None, attributes: DataFrame | None = None, timestamps: Series | None = None) DataFrame

Converts logs to log counter vector, after grouping log data based on the FeatureExtractor config.

Parameters:
  • log_pattern – The unstructured part of the log data.

  • attributes – The log attributes.

  • timestamps – The timestamps.

Returns:

The dataframe object containing the counts of the log-events after grouping.

convert_to_feature_vector(log_vectors: Series, attributes: DataFrame, timestamps: Series) DataFrame

Converts log data into feature vector, by combining the log vectors (can be output of LogVectorizer) with other numerical or categorical attributes of the logs, after grouping based on the FeatureExtractorConfig.

Parameters:
  • log_vectors – Numeric features of the logs (for e.g. the vectorized form of the log data obtained as output of LogVectorizer).

  • attributes – Categorical or numerical attributes for grouping, or numerical attributes serve as additional features.

  • timestamps – pd.Series object containing the timestamp data of the loglines.

Returns:

event_index_list: modified log data (pd.DataFrame) consisting of the converted feature vector form of the input log data after applying the log grouping. It contains an “event_index” field which maintains the sequence of log event ids where these ids correspond to the original input dataframe’s indices. block_list: pd.DataFrame object.

convert_to_sequence(log_pattern: Series | None = None, attributes: DataFrame | None = None, timestamps: Series | None = None)

Converts log data into sequence using sliding window technique, as defined in FeatureExtractorConfig.

Parameters:
  • log_pattern – A pd.Series object that encapsulates the entire arbitrary unstructured part of the log data (for example, can be the unstructured part of the raw log data or the output of the output of the log parser).

  • attributes – The structured part (attributes) of the raw log data.

  • timestamps – The timestamps data corresponding to the log lines.

Returns:

event_index_list: pd.DataFrame object of modified log data consisting of the sequence form of the structured and unstructured input data (i.e. log_pattern and attributes arguments) after running sliding window. For the unstructured part, the returned DataFrame contains an “event_index” field which maintains the sequence of log event ids where these ids correspond to the original input dataframe’s indices. event_sequence: pd.Series object containing the concatenating form of the unstructured input data (i.e. log_pattern argument), after concatenating the unstructured data for each sliding window.

class logai.information_extraction.feature_extractor.FeatureExtractorConfig(group_by_category: list | None = None, group_by_time: str | None = None, sliding_window: int = 0, steps: int = 1, max_feature_len: int = 100)

Bases: Config

Config class for Feature Extractor.

Parameters:
group_by_category: list = None
group_by_time: str = None
max_feature_len: int = 100
sliding_window: int = 0
steps: int = 1

logai.information_extraction.log_parser module

class logai.information_extraction.log_parser.LogParser(config: object)

Bases: object

Implementation of log parser for free-form text loglines.

Parameters:

config – The log parser configuration.

fit(loglines: Series)

Trains log parser with training loglines. :param loglines: A pd.Series object containing the list of loglines for training.

fit_parse(loglines: Series) DataFrame

Trains and parses the given loglines. :param loglines: A pd.Series object containing the list of loglines to train and parse. :return: A dataframe of parsed result [“loglines”, “parsed_loglines”, “parameter_list”].

static get_parameter_list(row)

Returns parameter list of the loglines.

Parameters:

row – The row in dataframe as function input containing [‘logline’, ‘parsed_logline’].

Returns:

The list of dynamic parameters.

load(model_path)

Loads existing parser models. :param model_path: The directory to load parser models.

parse(loglines: Series) DataFrame

Uses the trained log parser to parse loglines. :param loglines: A pd.Series object conatining the loglines for parsing. :return: A dataframe of parsed result [“loglines”, “parsed_loglines”, “parameter_list”].

save(out_path)

Saves the parser model. :param out_path: The directory to save parser models.

class logai.information_extraction.log_parser.LogParserConfig(parsing_algorithm: str = 'drain', parsing_algo_params: object | None = None, custom_config: object | None = None)

Bases: Config

Log Parser configuration.

custom_config: object = None
classmethod from_dict(config_dict)

Loads a config from a config dict.

Parameters:

config_dict – The config parameters in a dict.

parsing_algo_params: object = None
parsing_algorithm: str = 'drain'

logai.information_extraction.log_vectorizer module

class logai.information_extraction.log_vectorizer.LogVectorizer(config: VectorizerConfig)

Bases: object

Implement Log Vectorizer to transform raw log data to vectors. It Currently supports various statistical (e.g. TfIdfVectorizer) and neural (Word2Vec, FastText, LogBERT) vectorizer models.

fit(loglines: Series)

Fit method for LogVectorizer, to train the vectorizer model on the training data.

Parameters:

loglines – A pandas Series object containing the training raw log data.

transform(loglines: Series) Series

Transform method for LogVectorizer, to transform the raw log text data to vectors.

Parameters:

loglines – A pandas Series object containing the test raw log data.

Returns:

A pandas Series object containing the vectorized log data.

class logai.information_extraction.log_vectorizer.VectorizerConfig(algo_name: str = 'word2vec', algo_param: object | None = None, custom_param: object | None = None)

Bases: Config

Config class for Vectorizer.

Parameters:
  • algo_name – The name of the vectorizer algorithm.

  • algo_param – The parameters of the vectorizer algorithm .

  • custom_param – Additional custom parameters to be passed to the vectorizer algorithm.

algo_name: str
algo_param: object
custom_param: object
classmethod from_dict(config_dict)

Loads a config from a config dict.

Parameters:

config_dict – The config parameters in a dict.

Module contents