logai.preprocess package


logai.preprocess.bgl_preprocessor module

class logai.preprocess.bgl_preprocessor.BGLPreprocessor(config: PreprocessorConfig)

Bases: OpenSetPreprocessor

Custom preprocessor for Open log dataset BGL.

logai.preprocess.hdfs_preprocessor module

class logai.preprocess.hdfs_preprocessor.HDFSPreprocessor(config: PreprocessorConfig, label_file: str)

Bases: OpenSetPreprocessor

Custom Preprocessor for open log dataset HDFS.

logai.preprocess.openset_partitioner module

class logai.preprocess.openset_partitioner.OpenSetPartitioner(config: OpenSetPartitionerConfig)

Bases: object

Partitioner class for Open log datasets.


config – A config object specifying parameters of log partititoning for open log datasets.


Method to generate session window based log sequences from a logrecord object given some. ids at the logline level


logrecord – A log record object to be partitioned into session windows.


LogRecordObject where the body of logrecord object contains the generated log sequences.


Method to generate sliding window based log sequences from a logrecord object.


logrecord – A log record object to be partitioned into sliding windows.


LogRecordObject where the body of logrecord object contains the generated log sequences.


Wrapper function for applying partitioning on a logrecord object based on the Config parameters.


logrecord – A log record object to be partitioned into session or sliding windows.


LogRecordObject where the body of logrecord object contains the generated log sequences.

class logai.preprocess.openset_partitioner.OpenSetPartitionerConfig(sliding_window: int = 0, session_window: bool = True, logsequence_delim: str = '[SEP]')

Bases: Config

Config for Partitioner for open log datasets.

  • sliding_window – The size of sliding window.

  • session_window – A boolean flag whether to use session based partitioning or not.

  • logsequence_delim – The delimiter string for concatenating log sequences.

logsequence_delim: str
session_window: bool
sliding_window: int

logai.preprocess.openset_preprocessor module

class logai.preprocess.openset_preprocessor.OpenSetPreprocessor(config: PreprocessorConfig)

Bases: Preprocessor

Preprocessor class for Open log datasets.


config – A config object specifying parameters of log preprocessing for open log datasets.

clean_log(logrecord: LogRecordObject) LogRecordObject

Preprocessing cleaning of logrecord object creating from open log datasets.


logrecord – A log record object containing the raw log data from open datasets.


The cleaned logrecord object.

logai.preprocess.partitioner module

class logai.preprocess.partitioner.Partitioner(config: PartitionerConfig)

Bases: object

group_counter(logrecord_df: DataFrame) DataFrame

Groups log records by given categories and return counter vectors.


logrecord_df – The log record dataframe.


The log counter vector dataframe after grouping.

group_sliding_window(logrecord_df: DataFrame, logline_col_name='logline') DataFrame

Groups log records by sliding window based on the sliding window length, and returns the resulting pandas dataFrame object.


logrecord_df – A pandas dataFrame on which grouping is to be applied.


A pandas dataFrame after sliding window based grouping.

sliding_window(loglines: Series) Series

Conducts sliding window log partitioning.


loglines – The series of loglines.


The series of logline sequence after sliding window.

class logai.preprocess.partitioner.PartitionerConfig(group_by_category: list | None = None, group_by_time: str | None = None, sliding_window: int = 0, sep_token: str = '[SEP]', exclude_last_window: bool = False, exclude_smaller_windows: bool = False)

Bases: Config

Config class for Partitioner.

  • group_by_category – The list of fields to group log data by .

  • group_by_time – The string-type argument to specify grouping by time, supported types https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases.

  • sliding_window – The sliding window length if partitioning loglines into sliding windows.

  • sep_token – The separator token string to be used as delimiter, when grouping log data .

  • exclude_last_window – A boolean (default false) whether to exclude the last window when doing sliding window based grouping of log data.

  • exclude_smaller_windows – A boolean (default false) whether to exclude windows of length smaller than the given sliding_window argument.

exclude_last_window: bool
exclude_smaller_windows: bool
group_by_category: list
group_by_time: str
sep_token: str
sliding_window: int
logai.preprocess.partitioner.concat_logs(windows, tokens)

logai.preprocess.preprocessor module

class logai.preprocess.preprocessor.Preprocessor(config: PreprocessorConfig)

Bases: object

Preprocess class that contains common preprocess methods.

clean_log(loglines: Series) Series

Cleans the input log data.


loglines – The raw loglines data to be cleaned .


The cleaned loglines data .

group_log_index(attributes: DataFrame, by: array) DataFrame

Groups log attributes (DataFrame) by a list of its fields.

  • attributes – The log attribute data to be grouped.

  • by – A list of fields of the log attribute DataFrame object to group by.


The log attribute data after grouping.

identify_timestamps(logrecord: LogRecordObject)
class logai.preprocess.preprocessor.PreprocessorConfig(custom_delimiters_regex: dict | None = None, custom_replace_list: list | None = None)

Bases: Config

Config class for Preprocessor.

  • custom_delimiters_regex – A dictionary of delimiter regex patterns in raw log data.

  • custom_replace_list – A list of tuples of custom replace patterns in raw log data. Each Tuple should be of form (‘regex-pattern-to-replace’, ‘replaced-pattern’).

custom_delimiters_regex: dict
custom_replace_list: list

logai.preprocess.thunderbird_preprocessor module

class logai.preprocess.thunderbird_preprocessor.ThunderbirdPreprocessor(config: PreprocessorConfig)

Bases: OpenSetPreprocessor

Custom Preprocessor for Open log dataset Thunderbird.

Module contents