logai.dataloader package

Submodules

logai.dataloader.data_loader module

class logai.dataloader.data_loader.DataLoaderConfig(filepath: str = '', log_type: str = 'csv', dimensions: dict = {}, reader_args: dict = {}, infer_datetime: bool = False, datetime_format: str = '%Y-%M-%dT%H:%M:%SZ', open_dataset: str | None = None)

Bases: Config

The configuration class of data loader.

datetime_format: str
dimensions: dict
filepath: str
infer_datetime: bool
log_type: str
open_dataset: str
reader_args: dict
class logai.dataloader.data_loader.DefaultDataLoader

Bases: object

class logai.dataloader.data_loader.FileDataLoader(config: DataLoaderConfig)

Bases: object

Implementation of file data loader, reading log record objects from local files.

load_data() LogRecordObject

Loads log data with given configuration. Currently support file formats: - csv - tsv - other plain text format such as .log with proper parsing configurations

Returns:

The logs read from log files and converted into LogRecordObject.

logai.dataloader.data_loader_utils module

logai.dataloader.data_loader_utils.generate_logformat_regex(log_format)

Function to generate regular expression to split log messages.

return: headers, regex.

logai.dataloader.data_loader_utils.load_data(filename, log_format)

Loads log from given file and format.

Parameters:
  • filename – Files to read.

  • log_format – Target log format.

Returns:

The loaded log data.

logai.dataloader.data_loader_utils.log_to_dataframe(log_file, regex, headers)

Function to transform log file to dataframe.

return: The log dataframe.

logai.dataloader.data_model module

class logai.dataloader.data_model.LogRecordObject(timestamp: ~pandas.core.frame.DataFrame = Empty DataFrame Columns: [] Index: [], attributes: ~pandas.core.frame.DataFrame = Empty DataFrame Columns: [] Index: [], resource: ~pandas.core.frame.DataFrame = Empty DataFrame Columns: [] Index: [], trace_id: ~pandas.core.frame.DataFrame = Empty DataFrame Columns: [] Index: [], span_id: ~pandas.core.frame.DataFrame = Empty DataFrame Columns: [] Index: [], severity_text: ~pandas.core.frame.DataFrame = Empty DataFrame Columns: [] Index: [], severity_number: ~pandas.core.frame.DataFrame = Empty DataFrame Columns: [] Index: [], body: ~pandas.core.frame.DataFrame = Empty DataFrame Columns: [] Index: [], labels: ~pandas.core.frame.DataFrame = Empty DataFrame Columns: [] Index: [])

Bases: object

Log record object data model, compatible with log and event record definition in OpenTelemetry https://opentelemetry.io/docs/reference/specification/logs/data-model/#log-and-event-record-definition.

Parameters:
  • timestamp – The timestamp information of the log data.

  • attributes – The attributes of the log data (typically structured data with quantitative or categorical fields).

  • resource – The field denoting data source information generating the log data.

  • trace_id – The request trace id associated with the log data, if any.

  • span_id – The request span id associated with the log data, if any.

  • severity_text – The severity description or log level information.

  • severity_number – The severity number indicating log level.

  • body – The body of the log record, which contains the main information of the log. It can be consisting of either unstructured, semi-structured or structured information.

  • labels – Any associated label information with the log (for e.g. binary anomaly label indicating whether each line is anomalous or not).

  • _index – The indices of the log data.

attributes: DataFrame = Empty DataFrame Columns: [] Index: []
body: DataFrame = Empty DataFrame Columns: [] Index: []
dropna()

Method to drop entries containing NaN or null values in the logrecord object.

Returns:

The modified logrecord object after removing entries with NaN or null values.

filter_by_index(indices: list, inplace: bool = False)

Selects a subset of a logrecord object by removing certain indices.

Parameters:

indices – A list of indices to remove inplace (bool, optional): performs operation inplace or not.

Returns:

The resulting log record object created after removing the indices.

classmethod from_dataframe(data: DataFrame, meta_data: dict | None = None)

Converts pandas.DataFrame to log record object.

Parameters:
  • data – The log data in pandas dataframe.

  • meta_data – A dictionary that maps data.columns to fields of LogRecordObject.

Returns:

A LogRecordObject object.

labels: DataFrame = Empty DataFrame Columns: [] Index: []
classmethod load_from_csv(filepath)
resource: DataFrame = Empty DataFrame Columns: [] Index: []
save_to_csv(filepath: str)

Saves a log record object to file.

Parameters:

filepath – The absolute path to filename where the logrecord object would be saved.

select_by_index(indices: list, inplace: bool = False)

Selects a subset of a logrecord object based on selected indices.

Parameters:

indices – A list of indices to select inplace (bool, optional): performs operation inplace or not.

Returns:

LogRecordObject: The resulting logr ecord object created from the selected indices.

severity_number: DataFrame = Empty DataFrame Columns: [] Index: []
severity_text: DataFrame = Empty DataFrame Columns: [] Index: []
span_id: DataFrame = Empty DataFrame Columns: [] Index: []
timestamp: DataFrame = Empty DataFrame Columns: [] Index: []
to_dataframe()

Generates pandas.DataFrame from LogRecordType.

trace_id: DataFrame = Empty DataFrame Columns: [] Index: []

logai.dataloader.openset_data_loader module

class logai.dataloader.openset_data_loader.OpenSetDataLoader(config: OpenSetDataLoaderConfig)

Bases: FileDataLoader

property dl_config
load_data()

Loads log data with given configuration. Currently support file formats: - csv - tsv - other plain text format such as .log with proper parsing configurations

Returns:

The logs read from log files and converted into LogRecordObject.

class logai.dataloader.openset_data_loader.OpenSetDataLoaderConfig(dataset_name: str | None = None, filepath: str | None = None)

Bases: Config

dataset_name: str
filepath: str
logai.dataloader.openset_data_loader.get_config(dataset_name, filepath) DataLoaderConfig

Retrieves the configuration of open log datasets to load data.

Parameters:
  • dataset_name – The supported log dataset name from (“hdfs”, “bgl”, “HealthApp”).

  • filepath – The log file path.

Returns:

The configuration to load open log datasets.

Module contents