logai.algorithms.anomaly_detection_algo package

Submodules

logai.algorithms.anomaly_detection_algo.anomaly_detector_het module

class logai.algorithms.anomaly_detection_algo.anomaly_detector_het.HetAnomalyDetectionConfig(algo_name: str = 'one_class_svm', algo_params: object | None = None, custom_params: object | None = None)

Bases: AnomalyDetectionConfig

Heterogeneous Anomaly Detector Parameters.

Parameters:

train_test_ratio – The ratio between test and training splits.

train_test_ratio: float = 0.3
class logai.algorithms.anomaly_detection_algo.anomaly_detector_het.HetAnomalyDetector(config: HetAnomalyDetectionConfig)

Bases: AnomalyDetector

Anomaly Detector Wrapper to handle heterogeneous log feature dataframe which include various attributes of log. For each attribute, we build its specific anomaly detector if the data satisfies the requirement. This current version only supports anomaly detection on the constants.LOGLINE_COUNTS field (i.e. frequency count of the log events).

fit_predict(log_feature: DataFrame) DataFrame

Trains a model and predicts anomaly scores.

Parameters:

log_features – A log feature dataframe that must contain at least two columns [‘timestamp’: datetime, constants.LOGLINE_COUNTS: int]. The rest of columns combinations are treated as log attribute ID.

Returns:

The predicted anomaly scores.

preprocess(counter_df: DataFrame)

Splits raw log feature dataframe by unique attribute ID.

Parameters:

counter_df – A log feature dataframe that must contain at least two columns [‘timestamp’: datetime, constants.LOGLINE_COUNTS: int]. The rest of columns combinations are treated as log attribute ID.

Returns:

The processed log feature dataframe.

logai.algorithms.anomaly_detection_algo.dbl module

class logai.algorithms.anomaly_detection_algo.dbl.DBLDetector(params: DBLDetectorParams)

Bases: AnomalyDetectionAlgo

Dynamic baseline based time series anomaly detection. This is a wrapper class for the Dynamic Baseline anomaly detection model from Merlion library . https://opensource.salesforce.com/Merlion/v1.3.1/merlion.models.anomaly.html#module-merlion.models.anomaly.dbl Current implementation only supports anomaly detection on the constants.LOGLINE_COUNTS class (which maintains frequency counts of the log events).

fit(log_features: DataFrame)

Training method of the Dynamic Baseline model.

Parameters:

log_features – A log feature dataframe that must only contain two columns [‘timestamp’: datetime, constants.LOGLINE_COUNTS: int].

predict(log_features: DataFrame)

Predicts anomaly scores for log_feature[“timestamp”, constants.LOGLINE_COUNTS].

Parameters:

log_features – A log feature dataframe that must contain two columns [‘timestamp’: datetime, ‘counts’: int].

Returns:

A dataframe of the predicted anomaly scores, e.g., index:log_features.index. value: anomaly score to indicate if anomaly or not.

class logai.algorithms.anomaly_detection_algo.dbl.DBLDetectorParams(threshold: float = 0.0, fixed_period: Tuple[str, str] | None = None, train_window: str | None = None, wind_sz: str = '1h', trends: List[str] | None = None, kwargs: dict = {})

Bases: Config

Dynamic Baseline Parameters. For more details on the paramaters see https://opensource.salesforce.com/Merlion/v1.3.1/merlion.models.anomaly.html#module-merlion.models.anomaly.dbl.

Parameters:
  • threshold – The rule to use for thresholding anomaly scores.

  • fixed_period(t0, tf); Train the model on all datapoints occurring between t0 and tf (inclusive).

  • train_window – A string representing a duration of time to serve as the scope for a rolling dynamic baseline model.

  • wind_sz – The window size in minutes to bucket times of day. This parameter only applied if a daily trend is one of the trends used.

  • trends – The list of trends to use. Supported trends are “daily”, “weekly” and “monthly”.

fixed_period: Tuple[str, str]
kwargs: dict
threshold: float
train_window: str
trends: List[str]
wind_sz: str

logai.algorithms.anomaly_detection_algo.distribution_divergence module

class logai.algorithms.anomaly_detection_algo.distribution_divergence.DistributionDivergence(params: DistributionDivergenceParams)

Bases: AnomalyDetectionAlgo

Class for Distribution Divergene based Anomaly Detection. Both during training and testing, it takes log features as input and construct a parametric distribution over them. For the test data, it reports the distribution divergence with the training data as the anomaly score.

fit(log_features: DataFrame)

Fit method of the distribution divergence based anomaly detector. Since it is a non-parametric model, there is no training required.

Parameters:

log_features – Log features as a pandas DataFrame object.

predict(log_features: DataFrame) list

Predict method of distribution divergence based anomaly detector. It computes the distribution divergence between the training distribution and the test distribution provided in predict method.

Parameters:

log_features – The test distribution as pandas DataFrame object.

Returns:

A list of scalar anomaly scores.

class logai.algorithms.anomaly_detection_algo.distribution_divergence.DistributionDivergenceParams(n_bins: int = 100, type: list = ['KL'])

Bases: Config

Parameters for distribution divergence based anomaly detector.

Parameters:
  • n_bins – The number of bins to use to discretize the continuous distribution into a discrete distribution

  • type – A list of types of distribution divergences. The allowed types are Kullback–Leibler (“KL”), Jensen–Shannon (“JS”). It also allows a comma separated list of metrics like (“KL,JS” or “JS,KL”).

n_bins: int
type: list

logai.algorithms.anomaly_detection_algo.ets module

class logai.algorithms.anomaly_detection_algo.ets.ETSDetector(params: ETSDetectorParams)

Bases: AnomalyDetectionAlgo

ETS Anomaly Detector. This is a wrapper for the ETS based Anomaly Detector from Merlion library https://opensource.salesforce.com/Merlion/v1.0.2/merlion.models.forecast.html#module-merlion.models.forecast.ets This current version only supports anomaly detection of the constants.LOGLINE_COUNTS (i.e. frequency count of log events)

fit(log_features: DataFrame)

Fit method to train ETS Anomaly Detector.

Parameters:

log_features – A log feature dataframe that must only contain two columns [‘timestamp’: datetime, constants.LOGLINE_COUNTS: int].

Returns:

train_scores: The anomaly scores dataframe [‘index’:log_features.index, ‘timestamps’: datetime, ‘anom_score’: scores, ‘trainval’: whether it is training set.

predict(log_features: DataFrame)

Predicts anomaly scores for log_feature[“timestamp”, constants.LOGLINE_COUNTS].

Parameters:

log_features – A log feature dataframe that must only contain two columns [‘timestamp’: datetime, constants.LOGLINE_COUNTS: int].

Returns:

test_scores: The anomaly scores dataframe [‘index’:log_features.index, ‘timestamps’: datetime, ‘anom_score’: scores, ‘trainval’: whether it is training set.

class logai.algorithms.anomaly_detection_algo.ets.ETSDetectorParams(max_forecast_steps: int | None = None, target_seq_index: int | None = None, error: str = 'add', trend: str = 'add', damped_trend: bool = True, seasonal: str = 'add', seasonal_periods: str | None = None, refit: bool = True, kwargs: dict = {})

Bases: Config

ETS Anomaly Detector Parameters. For more details of ETS parameters see https://opensource.salesforce.com/Merlion/v1.0.2/merlion.models.forecast.html#module-merlion.models.forecast.ets.

Parameters:
  • max_forecast_steps – Number of steps we would like to forecast for.

  • target_seq_index – The index of the univariate (amongst all univariates in a general multivariate time series) whose value we would like to forecast.

  • error – The error term. “add” or “mul”.

  • trend – The trend component. “add”, “mul” or None.

  • damped_trend – Whether or not an included trend component is damped.

  • seasonal – The seasonal component. “add”, “mul” or None.

  • seasonal_periods – The length of the seasonality cycle. ‘auto’ indicates automatically select the seasonality cycle. If no seasonality exists, change seasonal to None.

  • refit – if True, refit the full ETS model when time_series_prev is given to the forecast method (slower). If False, simply perform exponential smoothing (faster).

damped_trend: bool
error: str
kwargs: dict
max_forecast_steps: int
refit: bool
seasonal: str
seasonal_periods: str
target_seq_index: int
trend: str

logai.algorithms.anomaly_detection_algo.forecast_nn module

class logai.algorithms.anomaly_detection_algo.forecast_nn.ForcastBasedNeuralAD(config: ForecastBasedNNParams)

Bases: NNAnomalyDetectionAlgo

Forcasting based neural anomaly detection models taken from the deep-loglizer paper (https://arxiv.org/pdf/2107.05908.pdf).

Parameters:

config – The parameters of general forecasting based neural anomaly detection models.

fit(train_data: ForecastNNVectorizedDataset, dev_data: ForecastNNVectorizedDataset)

The fit method to train forecasting based neural anomaly detection models.

Parameters:
  • train_data – The training dataset of type ForecastNNVectorizedDataset (consisting of session_idx, features, window_anomalies and window_labels).

  • dev_data – The development dataset of type ForecastNNVectorizedDataset (consisting of session_idx, features, window_anomalies and window_labels).

predict(test_data: ForecastNNVectorizedDataset)

The predict method to run inference of forecasting based neural anomaly detection model on test dataset.

Parameters:

test_data – The test dataset of type ForecastNNVectorizedDataset (consisting of session_idx, features, window_anomalies and window_labels).

Returns:

A dict containing overall evaluation results.

class logai.algorithms.anomaly_detection_algo.forecast_nn.ForecastBasedCNN(config: CNNParams)

Bases: ForcastBasedNeuralAD

Forecasting based cnn model for log anomaly detection.

Parameters:

config – A config object containing parameters for CNN based anomaly detection model.

class logai.algorithms.anomaly_detection_algo.forecast_nn.ForecastBasedLSTM(config: LSTMParams)

Bases: ForcastBasedNeuralAD

Forecasting based lstm model for log anomaly detection.

Parameters:

config – A config object containing parameters for LSTM based anomaly detection model.

class logai.algorithms.anomaly_detection_algo.forecast_nn.ForecastBasedTransformer(config: TransformerParams)

Bases: ForcastBasedNeuralAD

Forecasting based transformer model for log anomaly detection.

Parameters:

config – A config object containing parameters for Transformer based anomaly detection model.

logai.algorithms.anomaly_detection_algo.isolation_forest module

class logai.algorithms.anomaly_detection_algo.isolation_forest.IsolationForestDetector(params: IsolationForestParams)

Bases: AnomalyDetectionAlgo

Isolation Forest based Anomaly Detector. This is a wrapper for the Isolation forest in scikit-learn library.

fit(log_features: DataFrame)

Fits an isolation forest model.

Parameters:

log_features – The input for model training.

Returns:

The scores of the training dataset.

predict(log_features: DataFrame) Series

Predicts anomalies.

Parameters:

log_features – The input for inference.

Returns:

A pandas dataframe of the predicted anomaly scores.

class logai.algorithms.anomaly_detection_algo.isolation_forest.IsolationForestParams(n_estimators: int = 100, max_samples: str = 'auto', contamination: str = 'auto', max_features: float = 1.0, bootstrap: bool = False, n_jobs: int | None = None, random_state: object | None = None, verbose: int = 0, warm_start: bool = False)

Bases: Config

Parameters for isolation forest based anomaly detection. For more explanation of the parameters see the documentation page in https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html.

Parameters:
  • n_estimators – The number of base estimators in the ensemble.

  • max_samples – The number of samples to draw from X to train each base estimator.

  • contamination – The amount of contamination of the data set, i.e. the proportion of outliers in the data set.

  • max_features – The number of features to draw from X to train each base estimator.

  • bootstrap – If True, individual trees are fit on random subsets of the training data sampled with replacement. If False, sampling without replacement is performed.

  • n_jobs – The number of jobs to run in parallel for both fit and predict.

  • random_state – Controls the pseudo-randomness of the selection of the feature and split values for each branching step and each tree in the forest.

  • verbose – Controls the verbosity of the tree building process.

  • warm_start – When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new forest.

bootstrap: bool
contamination: str
max_features: float
max_samples: str
n_estimators: int
n_jobs: int
random_state: object
verbose: int
warm_start: bool

logai.algorithms.anomaly_detection_algo.local_outlier_factor module

class logai.algorithms.anomaly_detection_algo.local_outlier_factor.LOFDetector(params: LOFParams)

Bases: AnomalyDetectionAlgo

Locality Outlier Factor based Anomaly Detector. This is a wrapper method for the LOF based Detector in scikit-learn library. See https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.LocalOutlierFactor.html for more details.

fit(log_features: DataFrame)

Fits a LOF model.

Parameters:

log_features – The input for model training.

Returns:

pandas.Dataframe : The scores of the training dataset.

predict(log_features: DataFrame) Series

Predicts anomaly scores.

Parameters:

log_features – The input for inference.

Returns:

A pandas dataframe of the predicted anomaly scores.

class logai.algorithms.anomaly_detection_algo.local_outlier_factor.LOFParams(n_neighbors: int = 20, algorithm: str = 'auto', leaf_size: int = 30, metric: callable = 'minkowski', p: int = 2, metric_params: dict | None = None, contamination: str = 'auto', novelty: bool = True, n_jobs: int | None = None)

Bases: Config

Parameters of Locality Outlier Factors based Anomaly Detector . For more explanations of the parameters see https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.LocalOutlierFactor.html.

Parameters:
  • n_neighbors – Number of neighbors to use by default for kneighbors queries.

  • algorithm – Algorithm used to compute the nearest neighbors, e.g., {‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}.

  • leaf_size – Leaf is size passed to BallTree or KDTree.

  • metric – Metric to use for distance computation.

  • p – Parameter for the Minkowski metric from sklearn.metrics.pairwise.pairwise_distances.

  • metric_params – Additional keyword arguments for the metric function.

  • contamination – The amount of contamination of the data set, i.e. the proportion of outliers in the data set.

  • novelty – By default, LocalOutlierFactor is only meant to be used for outlier detection (novelty=False). Set novelty to True if you want to use LocalOutlierFactor for novelty detection.

  • n_jobs – The number of parallel jobs to run for neighbors search.

algorithm: str
contamination: str
leaf_size: int
metric: callable
metric_params: dict
n_jobs: int
n_neighbors: int
novelty: bool
p: int

logai.algorithms.anomaly_detection_algo.logbert module

class logai.algorithms.anomaly_detection_algo.logbert.LogBERT(config: LogBERTConfig)

Bases: NNAnomalyDetectionAlgo

Logbert model for anomaly detection of logs :param config: A config object for logbert model.

fit(train_data: Dataset, dev_data: Dataset)

Fit method for training logBERT model.

Parameters:
  • train_data – The training dataset of type huggingface Dataset object.

  • dev_data – The development dataset of type huggingface Dataset object.

predict(test_data: Dataset) DataFrame

Predict method for running inference on logBERT model.

Parameters:

test_data – The test dataset of type huggingface Dataset object.

Returns:

A pandas dataframe object containing the evaluation results for each type of metric.

logai.algorithms.anomaly_detection_algo.one_class_svm module

class logai.algorithms.anomaly_detection_algo.one_class_svm.OneClassSVMDetector(params: OneClassSVMParams)

Bases: AnomalyDetectionAlgo

fit(log_features: DataFrame)

Fit method to train the OneClassSVM on log data.

Parameters:

log_features – Training log features as pandas DataFrame object.

Returns:

The scores of the training dataset.

predict(log_features: DataFrame) Series

Predict method to detect anomalies using OneClassSVM model on test log data.

Parameters:

log_features – Test log features data as pandas DataFrame object.

Returns:

A pandas dataframe of the predicted anomaly scores.

class logai.algorithms.anomaly_detection_algo.one_class_svm.OneClassSVMParams(kernel: str = 'linear', degree: int = 3, gamma: str = 'auto', coef0: float = 0.0, tol: float = 0.001, nu: float = 0.5, shrinking: bool = True, cache_size: float = 200, verbose: bool = False)

Bases: Config

Parameters for OneClass SVM based Anomaly Detector. For more explanations about the parameters see https://scikit-learn.org/stable/modules/generated/sklearn.svm.OneClassSVM.html.

Parameters:
  • kernel – Specifies the kernel type to be used in the algorithm, i.e., {‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’}.

  • degree – Degree of the polynomial kernel function (‘poly’).

  • gamma – Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’.

  • coef0 – Independent term in kernel function. It is only significant in ‘poly’ and ‘sigmoid’.

  • tol – Tolerance for stopping criterion.

  • nu – An upper bound on the fraction of training errors and a lower bound of the fraction of support vectors.

  • shrinking – Whether to use the shrinking heuristic.

  • cache_size – Specify the size of the kernel cache (in MB).

  • verbose – Enable verbose output.

cache_size: float
coef0: float
degree: int
gamma: str
kernel: str
nu: float
shrinking: bool
tol: float
verbose: bool

Module contents

class logai.algorithms.anomaly_detection_algo.DBLDetector(params: DBLDetectorParams)

Bases: AnomalyDetectionAlgo

Dynamic baseline based time series anomaly detection. This is a wrapper class for the Dynamic Baseline anomaly detection model from Merlion library . https://opensource.salesforce.com/Merlion/v1.3.1/merlion.models.anomaly.html#module-merlion.models.anomaly.dbl Current implementation only supports anomaly detection on the constants.LOGLINE_COUNTS class (which maintains frequency counts of the log events).

fit(log_features: DataFrame)

Training method of the Dynamic Baseline model.

Parameters:

log_features – A log feature dataframe that must only contain two columns [‘timestamp’: datetime, constants.LOGLINE_COUNTS: int].

predict(log_features: DataFrame)

Predicts anomaly scores for log_feature[“timestamp”, constants.LOGLINE_COUNTS].

Parameters:

log_features – A log feature dataframe that must contain two columns [‘timestamp’: datetime, ‘counts’: int].

Returns:

A dataframe of the predicted anomaly scores, e.g., index:log_features.index. value: anomaly score to indicate if anomaly or not.

class logai.algorithms.anomaly_detection_algo.DistributionDivergence(params: DistributionDivergenceParams)

Bases: AnomalyDetectionAlgo

Class for Distribution Divergene based Anomaly Detection. Both during training and testing, it takes log features as input and construct a parametric distribution over them. For the test data, it reports the distribution divergence with the training data as the anomaly score.

fit(log_features: DataFrame)

Fit method of the distribution divergence based anomaly detector. Since it is a non-parametric model, there is no training required.

Parameters:

log_features – Log features as a pandas DataFrame object.

predict(log_features: DataFrame) list

Predict method of distribution divergence based anomaly detector. It computes the distribution divergence between the training distribution and the test distribution provided in predict method.

Parameters:

log_features – The test distribution as pandas DataFrame object.

Returns:

A list of scalar anomaly scores.

class logai.algorithms.anomaly_detection_algo.ETSDetector(params: ETSDetectorParams)

Bases: AnomalyDetectionAlgo

ETS Anomaly Detector. This is a wrapper for the ETS based Anomaly Detector from Merlion library https://opensource.salesforce.com/Merlion/v1.0.2/merlion.models.forecast.html#module-merlion.models.forecast.ets This current version only supports anomaly detection of the constants.LOGLINE_COUNTS (i.e. frequency count of log events)

fit(log_features: DataFrame)

Fit method to train ETS Anomaly Detector.

Parameters:

log_features – A log feature dataframe that must only contain two columns [‘timestamp’: datetime, constants.LOGLINE_COUNTS: int].

Returns:

train_scores: The anomaly scores dataframe [‘index’:log_features.index, ‘timestamps’: datetime, ‘anom_score’: scores, ‘trainval’: whether it is training set.

predict(log_features: DataFrame)

Predicts anomaly scores for log_feature[“timestamp”, constants.LOGLINE_COUNTS].

Parameters:

log_features – A log feature dataframe that must only contain two columns [‘timestamp’: datetime, constants.LOGLINE_COUNTS: int].

Returns:

test_scores: The anomaly scores dataframe [‘index’:log_features.index, ‘timestamps’: datetime, ‘anom_score’: scores, ‘trainval’: whether it is training set.

class logai.algorithms.anomaly_detection_algo.ForecastBasedCNN(config: CNNParams)

Bases: ForcastBasedNeuralAD

Forecasting based cnn model for log anomaly detection.

Parameters:

config – A config object containing parameters for CNN based anomaly detection model.

class logai.algorithms.anomaly_detection_algo.ForecastBasedLSTM(config: LSTMParams)

Bases: ForcastBasedNeuralAD

Forecasting based lstm model for log anomaly detection.

Parameters:

config – A config object containing parameters for LSTM based anomaly detection model.

class logai.algorithms.anomaly_detection_algo.ForecastBasedTransformer(config: TransformerParams)

Bases: ForcastBasedNeuralAD

Forecasting based transformer model for log anomaly detection.

Parameters:

config – A config object containing parameters for Transformer based anomaly detection model.

class logai.algorithms.anomaly_detection_algo.IsolationForestDetector(params: IsolationForestParams)

Bases: AnomalyDetectionAlgo

Isolation Forest based Anomaly Detector. This is a wrapper for the Isolation forest in scikit-learn library.

fit(log_features: DataFrame)

Fits an isolation forest model.

Parameters:

log_features – The input for model training.

Returns:

The scores of the training dataset.

predict(log_features: DataFrame) Series

Predicts anomalies.

Parameters:

log_features – The input for inference.

Returns:

A pandas dataframe of the predicted anomaly scores.

class logai.algorithms.anomaly_detection_algo.LOFDetector(params: LOFParams)

Bases: AnomalyDetectionAlgo

Locality Outlier Factor based Anomaly Detector. This is a wrapper method for the LOF based Detector in scikit-learn library. See https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.LocalOutlierFactor.html for more details.

fit(log_features: DataFrame)

Fits a LOF model.

Parameters:

log_features – The input for model training.

Returns:

pandas.Dataframe : The scores of the training dataset.

predict(log_features: DataFrame) Series

Predicts anomaly scores.

Parameters:

log_features – The input for inference.

Returns:

A pandas dataframe of the predicted anomaly scores.

class logai.algorithms.anomaly_detection_algo.LogBERT(config: LogBERTConfig)

Bases: NNAnomalyDetectionAlgo

Logbert model for anomaly detection of logs :param config: A config object for logbert model.

fit(train_data: Dataset, dev_data: Dataset)

Fit method for training logBERT model.

Parameters:
  • train_data – The training dataset of type huggingface Dataset object.

  • dev_data – The development dataset of type huggingface Dataset object.

predict(test_data: Dataset) DataFrame

Predict method for running inference on logBERT model.

Parameters:

test_data – The test dataset of type huggingface Dataset object.

Returns:

A pandas dataframe object containing the evaluation results for each type of metric.

class logai.algorithms.anomaly_detection_algo.OneClassSVMDetector(params: OneClassSVMParams)

Bases: AnomalyDetectionAlgo

fit(log_features: DataFrame)

Fit method to train the OneClassSVM on log data.

Parameters:

log_features – Training log features as pandas DataFrame object.

Returns:

The scores of the training dataset.

predict(log_features: DataFrame) Series

Predict method to detect anomalies using OneClassSVM model on test log data.

Parameters:

log_features – Test log features data as pandas DataFrame object.

Returns:

A pandas dataframe of the predicted anomaly scores.