logai.algorithms.clustering_algo package

Submodules

logai.algorithms.clustering_algo.birch module

class logai.algorithms.clustering_algo.birch.BirchAlgo(params: BirchParams)

Bases: ClusteringAlgo

BIRCH algorithm for log clustering. This is a wrapper class for the Birch Clustering algorithm in scikit-learn https://scikit-learn.org/stable/modules/generated/sklearn.cluster.Birch.html.

fit(log_features: DataFrame)

Trains a BIRCH model.

Parameters:

log_features – The log features for training.

predict(log_features: DataFrame) Series

Predicts using trained BIRCH model.

Parameters:

log_features – The log features for inference.

Returns:

A pandas series of log cluster labels.

class logai.algorithms.clustering_algo.birch.BirchParams(branching_factor: int = 50, n_clusters: int | None = None, threshold: float = 1.5)

Bases: Config

Parameters for Birch Clustering Algo. For more details on the parameters, see https://scikit-learn.org/stable/modules/generated/sklearn.cluster.Birch.html.

Parameters:
  • branching_factor – Maximum number of CF subclusters in each node.

  • n_clusters – Number of clusters after the final clustering step, which treats the subclusters from the leaves as new samples.

  • threshold – The radius of the subcluster obtained by merging a new sample and the closest subcluster should be lesser than the threshold.

branching_factor: int
n_clusters: int
threshold: float

logai.algorithms.clustering_algo.dbscan module

class logai.algorithms.clustering_algo.dbscan.DbScanAlgo(params: DbScanParams)

Bases: ClusteringAlgo

DBSCAN algorithm for log clustering. This is a wrapper class for DBScan based from scikit-learn library https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html

fit(log_features: DataFrame)

Trains a DBSCAN model.

Parameters:

log_features – The log features as training data.

predict(log_features: DataFrame) Series

Predicts using the trained DBSCAN model.

Parameters:

log_features – The log features for inference.

Returns:

A pandas series of cluster labels.

class logai.algorithms.clustering_algo.dbscan.DbScanParams(eps: float = 0.3, min_samples: int = 10, metric: str = 'euclidean', metric_params: object | None = None, algorithm: str = 'auto', leaf_size: int = 30, p: float | None = None, n_jobs: int | None = None)

Bases: Config

Parameters for DBScan based clustering algorithm. For more details on parameters see https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html.

Parameters:
  • eps – The maximum distance between two samples for one to be considered as in the neighborhood of the other.

  • min_samples – The number of samples (or total weight) in a neighborhood for a point to be considered as a core point.

  • metric – The metric to use when calculating distance between instances in a feature array.

  • metric_params – Additional keyword arguments for the metric function.

  • algorithm – The algorithm to be used by the NearestNeighbors module to compute pointwise distances and find nearest neighbors, i.e., {‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}.

  • leaf_size – Leaf size passed to BallTree or cKDTree.

  • p – The power of the Minkowski metric to be used to calculate distance between points.

  • n_jobs – The number of parallel jobs to run.

algorithm: str
eps: float
leaf_size: int
metric: str
metric_params: object
min_samples: int
n_jobs: int
p: float

logai.algorithms.clustering_algo.kmeans module

class logai.algorithms.clustering_algo.kmeans.KMeansAlgo(params: KMeansParams)

Bases: ClusteringAlgo

K-means algorithm for log clustering. This is a wrapper class for K-Means clustering method from scikit-learn library https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html.

fit(log_features: DataFrame)

Fits a K-means model.

Parameters:

log_features – The log features for training

predict(log_features: DataFrame) Series

Predicts using trained K-means model.

Parameters:

log_features – The log features for inference.

Returns:

A pandas series of cluster labels.

class logai.algorithms.clustering_algo.kmeans.KMeansParams(n_clusters: int = 8, init: str = 'k-means++', n_init: int = 10, max_iter: int = 300, tol: float = 0.0001, verbose: int = 0, random_state: int | None = None, copy_x: bool = True, algorithm: str = 'auto')

Bases: Config

Parameters of the KMeans Clustering algorithm. For more details on the parameters see https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html.

Parameters:
  • n_clusters – The number of clusters to form as well as the number of centroids to generate.

  • init – Method for initialization, i.e., {‘k-means++’, ‘random’}.

  • n_init – Number of times the k-means algorithm is run with different centroid seeds.

  • max_iter – Maximum number of iterations of the k-means algorithm for a single run.

  • tol – Relative tolerance with regards to Frobenius norm of the difference in the cluster centers of two consecutive iterations to declare convergence.

  • verbose – Verbosity mode.

  • random_state – Determines random number generation for centroid initialization.

  • copy_x – If copy_x is True (default), then the original data is not modified. If False, the original data is modified, and put back before the function returns.

  • algorithm – K-means algorithm to use, i.e., {“lloyd”, “elkan”, “auto”, “full”}.

algorithm: str
copy_x: bool
init: str
max_iter: int
n_clusters: int
n_init: int
random_state: int
tol: float
verbose: int

Module contents

class logai.algorithms.clustering_algo.BirchAlgo(params: BirchParams)

Bases: ClusteringAlgo

BIRCH algorithm for log clustering. This is a wrapper class for the Birch Clustering algorithm in scikit-learn https://scikit-learn.org/stable/modules/generated/sklearn.cluster.Birch.html.

fit(log_features: DataFrame)

Trains a BIRCH model.

Parameters:

log_features – The log features for training.

predict(log_features: DataFrame) Series

Predicts using trained BIRCH model.

Parameters:

log_features – The log features for inference.

Returns:

A pandas series of log cluster labels.

class logai.algorithms.clustering_algo.DbScanAlgo(params: DbScanParams)

Bases: ClusteringAlgo

DBSCAN algorithm for log clustering. This is a wrapper class for DBScan based from scikit-learn library https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html

fit(log_features: DataFrame)

Trains a DBSCAN model.

Parameters:

log_features – The log features as training data.

predict(log_features: DataFrame) Series

Predicts using the trained DBSCAN model.

Parameters:

log_features – The log features for inference.

Returns:

A pandas series of cluster labels.

class logai.algorithms.clustering_algo.KMeansAlgo(params: KMeansParams)

Bases: ClusteringAlgo

K-means algorithm for log clustering. This is a wrapper class for K-Means clustering method from scikit-learn library https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html.

fit(log_features: DataFrame)

Fits a K-means model.

Parameters:

log_features – The log features for training

predict(log_features: DataFrame) Series

Predicts using trained K-means model.

Parameters:

log_features – The log features for inference.

Returns:

A pandas series of cluster labels.