logai.dataloader package
Submodules
logai.dataloader.data_loader module
- class logai.dataloader.data_loader.DataLoaderConfig(filepath: str = '', log_type: str = 'csv', dimensions: dict = {}, reader_args: dict = {}, infer_datetime: bool = False, datetime_format: str = '%Y-%M-%dT%H:%M:%SZ', open_dataset: str | None = None)
Bases:
Config
The configuration class of data loader.
- datetime_format: str
- dimensions: dict
- filepath: str
- infer_datetime: bool
- log_type: str
- open_dataset: str
- reader_args: dict
- class logai.dataloader.data_loader.DefaultDataLoader
Bases:
object
- class logai.dataloader.data_loader.FileDataLoader(config: DataLoaderConfig)
Bases:
object
Implementation of file data loader, reading log record objects from local files.
- load_data() LogRecordObject
Loads log data with given configuration. Currently support file formats: - csv - tsv - other plain text format such as .log with proper parsing configurations
- Returns:
The logs read from log files and converted into LogRecordObject.
logai.dataloader.data_loader_utils module
- logai.dataloader.data_loader_utils.generate_logformat_regex(log_format)
Function to generate regular expression to split log messages.
return: headers, regex.
- logai.dataloader.data_loader_utils.load_data(filename, log_format)
Loads log from given file and format.
- Parameters:
filename – Files to read.
log_format – Target log format.
- Returns:
The loaded log data.
- logai.dataloader.data_loader_utils.log_to_dataframe(log_file, regex, headers)
Function to transform log file to dataframe.
return: The log dataframe.
logai.dataloader.data_model module
- class logai.dataloader.data_model.LogRecordObject(timestamp: ~pandas.core.frame.DataFrame = Empty DataFrame Columns: [] Index: [], attributes: ~pandas.core.frame.DataFrame = Empty DataFrame Columns: [] Index: [], resource: ~pandas.core.frame.DataFrame = Empty DataFrame Columns: [] Index: [], trace_id: ~pandas.core.frame.DataFrame = Empty DataFrame Columns: [] Index: [], span_id: ~pandas.core.frame.DataFrame = Empty DataFrame Columns: [] Index: [], severity_text: ~pandas.core.frame.DataFrame = Empty DataFrame Columns: [] Index: [], severity_number: ~pandas.core.frame.DataFrame = Empty DataFrame Columns: [] Index: [], body: ~pandas.core.frame.DataFrame = Empty DataFrame Columns: [] Index: [], labels: ~pandas.core.frame.DataFrame = Empty DataFrame Columns: [] Index: [])
Bases:
object
Log record object data model, compatible with log and event record definition in OpenTelemetry https://opentelemetry.io/docs/reference/specification/logs/data-model/#log-and-event-record-definition.
- Parameters:
timestamp – The timestamp information of the log data.
attributes – The attributes of the log data (typically structured data with quantitative or categorical fields).
resource – The field denoting data source information generating the log data.
trace_id – The request trace id associated with the log data, if any.
span_id – The request span id associated with the log data, if any.
severity_text – The severity description or log level information.
severity_number – The severity number indicating log level.
body – The body of the log record, which contains the main information of the log. It can be consisting of either unstructured, semi-structured or structured information.
labels – Any associated label information with the log (for e.g. binary anomaly label indicating whether each line is anomalous or not).
_index – The indices of the log data.
- attributes: DataFrame = Empty DataFrame Columns: [] Index: []
- body: DataFrame = Empty DataFrame Columns: [] Index: []
- dropna()
Method to drop entries containing NaN or null values in the logrecord object.
- Returns:
The modified logrecord object after removing entries with NaN or null values.
- filter_by_index(indices: list, inplace: bool = False)
Selects a subset of a logrecord object by removing certain indices.
- Parameters:
indices – A list of indices to remove inplace (bool, optional): performs operation inplace or not.
- Returns:
The resulting log record object created after removing the indices.
- classmethod from_dataframe(data: DataFrame, meta_data: dict | None = None)
Converts pandas.DataFrame to log record object.
- Parameters:
data – The log data in pandas dataframe.
meta_data – A dictionary that maps data.columns to fields of LogRecordObject.
- Returns:
A LogRecordObject object.
- labels: DataFrame = Empty DataFrame Columns: [] Index: []
- classmethod load_from_csv(filepath)
- resource: DataFrame = Empty DataFrame Columns: [] Index: []
- save_to_csv(filepath: str)
Saves a log record object to file.
- Parameters:
filepath – The absolute path to filename where the logrecord object would be saved.
- select_by_index(indices: list, inplace: bool = False)
Selects a subset of a logrecord object based on selected indices.
- Parameters:
indices – A list of indices to select inplace (bool, optional): performs operation inplace or not.
- Returns:
LogRecordObject: The resulting logr ecord object created from the selected indices.
- severity_number: DataFrame = Empty DataFrame Columns: [] Index: []
- severity_text: DataFrame = Empty DataFrame Columns: [] Index: []
- span_id: DataFrame = Empty DataFrame Columns: [] Index: []
- timestamp: DataFrame = Empty DataFrame Columns: [] Index: []
- to_dataframe()
Generates pandas.DataFrame from LogRecordType.
- trace_id: DataFrame = Empty DataFrame Columns: [] Index: []
logai.dataloader.openset_data_loader module
- class logai.dataloader.openset_data_loader.OpenSetDataLoader(config: OpenSetDataLoaderConfig)
Bases:
FileDataLoader
- property dl_config
- load_data()
Loads log data with given configuration. Currently support file formats: - csv - tsv - other plain text format such as .log with proper parsing configurations
- Returns:
The logs read from log files and converted into LogRecordObject.
- class logai.dataloader.openset_data_loader.OpenSetDataLoaderConfig(dataset_name: str | None = None, filepath: str | None = None)
Bases:
Config
- dataset_name: str
- filepath: str
- logai.dataloader.openset_data_loader.get_config(dataset_name, filepath) DataLoaderConfig
Retrieves the configuration of open log datasets to load data.
- Parameters:
dataset_name – The supported log dataset name from (“hdfs”, “bgl”, “HealthApp”).
filepath – The log file path.
- Returns:
The configuration to load open log datasets.