logai.preprocess package
Submodules
logai.preprocess.bgl_preprocessor module
- class logai.preprocess.bgl_preprocessor.BGLPreprocessor(config: PreprocessorConfig)
Bases:
OpenSetPreprocessor
Custom preprocessor for Open log dataset BGL.
logai.preprocess.hdfs_preprocessor module
- class logai.preprocess.hdfs_preprocessor.HDFSPreprocessor(config: PreprocessorConfig, label_file: str)
Bases:
OpenSetPreprocessor
Custom Preprocessor for open log dataset HDFS.
logai.preprocess.openset_partitioner module
- class logai.preprocess.openset_partitioner.OpenSetPartitioner(config: OpenSetPartitionerConfig)
Bases:
object
Partitioner class for Open log datasets.
- Parameters:
config – A config object specifying parameters of log partititoning for open log datasets.
- generate_session_window(logrecord)
Method to generate session window based log sequences from a logrecord object given some. ids at the logline level
- Parameters:
logrecord – A log record object to be partitioned into session windows.
- Returns:
LogRecordObject where the body of logrecord object contains the generated log sequences.
- generate_sliding_window(logrecord)
Method to generate sliding window based log sequences from a logrecord object.
- Parameters:
logrecord – A log record object to be partitioned into sliding windows.
- Returns:
LogRecordObject where the body of logrecord object contains the generated log sequences.
- partition(logrecord)
Wrapper function for applying partitioning on a logrecord object based on the Config parameters.
- Parameters:
logrecord – A log record object to be partitioned into session or sliding windows.
- Returns:
LogRecordObject where the body of logrecord object contains the generated log sequences.
- class logai.preprocess.openset_partitioner.OpenSetPartitionerConfig(sliding_window: int = 0, session_window: bool = True, logsequence_delim: str = '[SEP]')
Bases:
Config
Config for Partitioner for open log datasets.
- Parameters:
sliding_window – The size of sliding window.
session_window – A boolean flag whether to use session based partitioning or not.
logsequence_delim – The delimiter string for concatenating log sequences.
- logsequence_delim: str
- session_window: bool
- sliding_window: int
logai.preprocess.openset_preprocessor module
- class logai.preprocess.openset_preprocessor.OpenSetPreprocessor(config: PreprocessorConfig)
Bases:
Preprocessor
Preprocessor class for Open log datasets.
- Parameters:
config – A config object specifying parameters of log preprocessing for open log datasets.
- clean_log(logrecord: LogRecordObject) LogRecordObject
Preprocessing cleaning of logrecord object creating from open log datasets.
- Parameters:
logrecord – A log record object containing the raw log data from open datasets.
- Returns:
The cleaned logrecord object.
logai.preprocess.partitioner module
- class logai.preprocess.partitioner.Partitioner(config: PartitionerConfig)
Bases:
object
- group_counter(logrecord_df: DataFrame) DataFrame
Groups log records by given categories and return counter vectors.
- Parameters:
logrecord_df – The log record dataframe.
- Returns:
The log counter vector dataframe after grouping.
- group_sliding_window(logrecord_df: DataFrame, logline_col_name='logline') DataFrame
Groups log records by sliding window based on the sliding window length, and returns the resulting pandas dataFrame object.
- Parameters:
logrecord_df – A pandas dataFrame on which grouping is to be applied.
- Returns:
A pandas dataFrame after sliding window based grouping.
- sliding_window(loglines: Series) Series
Conducts sliding window log partitioning.
- Parameters:
loglines – The series of loglines.
- Returns:
The series of logline sequence after sliding window.
- class logai.preprocess.partitioner.PartitionerConfig(group_by_category: list | None = None, group_by_time: str | None = None, sliding_window: int = 0, sep_token: str = '[SEP]', exclude_last_window: bool = False, exclude_smaller_windows: bool = False)
Bases:
Config
Config class for Partitioner.
- Parameters:
group_by_category – The list of fields to group log data by .
group_by_time – The string-type argument to specify grouping by time, supported types https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases.
sliding_window – The sliding window length if partitioning loglines into sliding windows.
sep_token – The separator token string to be used as delimiter, when grouping log data .
exclude_last_window – A boolean (default false) whether to exclude the last window when doing sliding window based grouping of log data.
exclude_smaller_windows – A boolean (default false) whether to exclude windows of length smaller than the given sliding_window argument.
- exclude_last_window: bool
- exclude_smaller_windows: bool
- group_by_category: list
- group_by_time: str
- sep_token: str
- sliding_window: int
- logai.preprocess.partitioner.concat_logs(windows, tokens)
logai.preprocess.preprocessor module
- class logai.preprocess.preprocessor.Preprocessor(config: PreprocessorConfig)
Bases:
object
Preprocess class that contains common preprocess methods.
- clean_log(loglines: Series) Series
Cleans the input log data.
- Parameters:
loglines – The raw loglines data to be cleaned .
- Return:pd.Series:
The cleaned loglines data .
- group_log_index(attributes: DataFrame, by: array) DataFrame
Groups log attributes (DataFrame) by a list of its fields.
- Parameters:
attributes – The log attribute data to be grouped.
by – A list of fields of the log attribute DataFrame object to group by.
- Returns:
The log attribute data after grouping.
- identify_timestamps(logrecord: LogRecordObject)
- class logai.preprocess.preprocessor.PreprocessorConfig(custom_delimiters_regex: dict | None = None, custom_replace_list: list | None = None)
Bases:
Config
Config class for Preprocessor.
- Parameters:
custom_delimiters_regex – A dictionary of delimiter regex patterns in raw log data.
custom_replace_list – A list of tuples of custom replace patterns in raw log data. Each Tuple should be of form (‘regex-pattern-to-replace’, ‘replaced-pattern’).
- custom_delimiters_regex: dict
- custom_replace_list: list
logai.preprocess.thunderbird_preprocessor module
- class logai.preprocess.thunderbird_preprocessor.ThunderbirdPreprocessor(config: PreprocessorConfig)
Bases:
OpenSetPreprocessor
Custom Preprocessor for Open log dataset Thunderbird.