pyrca.analyzers package
Base classes for all RCA algorithms |
|
The RCA method based on Bayesian inference. |
|
The RCA method based on random walk |
|
The epsilon-Diagnosis algorithm. |
|
The Root Cause Discovery (RCD) algorithm |
|
The Hypothesis Testing (HT) algorithm |
pyrca.analyzers.base module
Base classes for all RCA algorithms
- class pyrca.analyzers.base.RCAResults(root_cause_nodes=<factory>, root_cause_paths=<factory>)
Bases:
object
The class for storing root cause analysis results.
- Parameters
root_cause_nodes (
list
) – A list of potential root causes, e.g., [(metric_a, score_a), (metric_b, score_b), …].root_cause_paths (
dict
) – A dict of root cause paths, where the key is the metric name and the value is a list of paths. Each path has the following format: (path_score, [(path_node_a, score_a), (path_node_b, score_b), …]). Ifpath_node_a
has no score,score_a
is set to None.
- root_cause_nodes: list
- root_cause_paths: dict
- to_dict()
Converts the RCA results into a dict.
- Return type
dict
- to_list()
Converts the RCA results into a list.
- Return type
list
- class pyrca.analyzers.base.BaseRCA
Bases:
BaseModel
Base class for RCA algorithms. This class should not be used directly, Use derived class instead.
- abstract train(**kwargs)
The training procedure for learning model parameters. :type kwargs: :param kwargs: Parameters needed for training.
- abstract find_root_causes(**kwargs)
Finds the root causes given the observed anomalous metrics. :type kwargs: :param kwargs: Additional parameters.
- Return type
pyrca.analyzers.bayesian module
The RCA method based on Bayesian inference.
- class pyrca.analyzers.bayesian.BayesianNetworkConfig(graph, sigmas=None, default_sigma=4.0, thres_win_size=5, thres_reduce_func='mean', infer_method='posterior', root_cause_top_k=3)
Bases:
BaseConfig
The configuration class for the Bayesian network based RCA.
- Parameters
graph (
Union
[DataFrame
,str
]) – The adjacency matrix of the causal graph, which can be a pandas dataframe or a file path of a CSV file or a pickled file.sigmas (
Optional
[Dict
]) – Specific sigmas other thandefault_sigma
for certain variables. This parameter is for constructing training data (only used whendetector
in thetrain
function is not set).default_sigma (
float
) – The default sigma value for computing the detection. This parameter is for constructing training data (only used whendetector
in thetrain
function is not set).thres_win_size (
int
) – The size of the smoothing window for computing the detection threshold. This parameter is for constructing training data (only used whendetector
in thetrain
function is not set).thres_reduce_func (
str
) – The reduction function for threshold, i.e., “mean” uses the mean value and standard deviation, “median” uses the median value and median absolute deviation. This parameter is for constructing training data (only used whendetector
in thetrain
function is not set).infer_method (
str
) – Use “posterior” or “likelihood” when doing Bayesian inference.root_cause_top_k (
int
) – The maximum number of root causes in the results.
- graph: Union[DataFrame, str]
- sigmas: Dict = None
- default_sigma: float = 4.0
- thres_win_size: int = 5
- thres_reduce_func: str = 'mean'
- infer_method: str = 'posterior'
- root_cause_top_k: int = 3
- class pyrca.analyzers.bayesian.BayesianNetwork(config)
Bases:
BaseRCA
The RCA method based on Bayesian inference.
- config_class
alias of
BayesianNetworkConfig
- train(dfs, detector=None, **kwargs)
Estimates Bayesian network parameters given the training time series.
- Parameters
dfs (
Union
[DataFrame
,List
[DataFrame
]]) – One or multiple training time series.detector (
Optional
[BaseModel
]) – The detector used to construct the training dataset, i.e., the training dataset includes the anomaly labels detected bydetector
. Ifdetector
is None, a default stats-based detector will be applied.
- add_root_causes(root_causes)
Adds additional root cause nodes into the graph based on domain knowledge.
- Parameters
root_causes (
List
) – A list of root causes.
- update_probability(target_node, parent_nodes, prob)
Updates the Bayesian network parameters. For example, if we want to set P(A=1 | B=0, C=1, D=1) = p, we set target_node = A, parent_nodes = [C, D] and prob = p.
- Parameters
target_node (
str
) – The child/effect node to be modified.parent_nodes (
List
) – The parent nodes whose values are ones.prob (
float
) – The probability that the value of target node is one given these parent nodes.
- find_root_causes(anomalous_metrics, set_zero_path_score_for_normal_metrics=False, remove_zero_score_node_in_path=True, **kwargs)
Finds the root causes given the observed anomalous metrics.
- Parameters
anomalous_metrics (
Union
[List
,Dict
]) – A list of anomalous metrics.anomalous_metrics
is either a list [‘metric_A’, ‘metric_B’, …] or a dict {‘metric_A’: 1, ‘metric_B’: 1}.set_zero_path_score_for_normal_metrics (
bool
) – Whether to set the scores of normal metrics (metrics that are not inanomalous_metrics
) to zeros when computing root cause path scores.remove_zero_score_node_in_path (
bool
) – Whether to remove the nodes with zero scores from the paths.
- Return type
List
- Returns
A list of the found root causes.
- save(directory, filename='bn', **kwargs)
Saves the initialized model.
- Parameters
directory – The folder for the dumped explainer.
filename – The filename (the model class name if it is None).
- classmethod load(directory, filename='bn', **kwargs)
Loads the dumped model.
- Parameters
directory – The folder for the dumped model.
filename – The filename (the model class name if it is None).
- print_probabilities()
pyrca.analyzers.random_walk module
The RCA method based on random walk
- class pyrca.analyzers.random_walk.RandomWalkConfig(graph, use_partial_corr=False, rho=0.1, num_steps=10, num_repeats=1000, root_cause_top_k=5)
Bases:
BaseConfig
The configuration class for the random walk based RCA.
- Parameters
graph (
Union
[DataFrame
,str
]) – The adjacency matrix of the causal graph, which can be a pandas dataframe or a file path of a CSV file or a pickled file.use_partial_corr (
bool
) – Whether to use partial correlation when computing edge weights.rho (
float
) – The weight from a “cause” node to a “result” node.num_steps (
int
) – The number of random walk steps in each run.num_repeats (
int
) – The number of random walk runs.root_cause_top_k (
int
) – The maximum number of root causes in the results.
- graph: Union[DataFrame, str]
- use_partial_corr: bool = False
- rho: float = 0.1
- num_steps: int = 10
- num_repeats: int = 1000
- root_cause_top_k: int = 5
- class pyrca.analyzers.random_walk.RandomWalk(config)
Bases:
BaseRCA
The RCA method based on random walk on the topology/causal graph.
- config_class
alias of
RandomWalkConfig
- train(**kwargs)
Random walks needs no training.
- find_root_causes(anomalous_metrics, df, **kwargs)
Finds the root causes given the observed anomalous metrics.
- Parameters
anomalous_metrics (
Union
[List
,Dict
]) – A list of anomalous metrics.anomalous_metrics
is either a list [‘metric_A’, ‘metric_B’, …] or a dict {‘metric_A’: 1, ‘metric_B’: 1}.df (
DataFrame
) – The time series dataframe in the incident window.
- Return type
- Returns
A list of the found root causes.
pyrca.analyzers.epsilon_diagnosis module
The epsilon-Diagnosis algorithm.
- class pyrca.analyzers.epsilon_diagnosis.EpsilonDiagnosisConfig(alpha=0.05, bootstrap_time=200, root_cause_top_k=3)
Bases:
BaseConfig
The configuration class for the epsilon-diagnosis algorithm for Root Cause Analysis.
- Parameters
alpha (
float
) – The desired significance level (float) in (0, 1). Default: 0.05.bootstrap_time (
int
) – Bootstrap times.root_cause_top_k (
int
) – The maximum number of root causes in the results.
- alpha: float = 0.05
- bootstrap_time: int = 200
- root_cause_top_k: int = 3
- class pyrca.analyzers.epsilon_diagnosis.EpsilonDiagnosis(config)
Bases:
BaseRCA
The epsilon-diagnosis method for Root Cause Analysis. If using this method, please cite the original work: epsilon-Diagnosis: Unsupervised and Real-time Diagnosis of Small window Long-tail Latency in Large-scale Microservice Platforms.
- config_class
alias of
EpsilonDiagnosisConfig
- train(normal_df, **kwargs)
Two variable correlation analysis given the training time series.
- Parameters
normal_df (
DataFrame
) – A pandas dataframe of normal data.
- find_root_causes(abnormal_df, **kwargs)
Finds the root causes given the abnormal dataset.
- Parameters
abnormal_df (
DataFrame
) – A pandas dataFrame of abnormal data.- Returns
A list of the found root causes.
pyrca.analyzers.rcd module
The Root Cause Discovery (RCD) algorithm
- class pyrca.analyzers.rcd.RCDConfig(start_alpha=0.01, alpha_step=0.1, alpha_limit=1, localized=True, gamma=5, bins=5, k=3, f_node='F-node', verbose=False, ci_test='chisq')
Bases:
BaseConfig
The configuration class for the RCD algorithm for Root Cause Analysis
- Parameters
start_alpha (
float
) – The desired start significance level (float) in (0, 1) for search.alpha_step (
float
) – The search step for alpha.alpha_limit (
float
) – The maximum alpha for search.localized (
bool
) – Whether use local method.gamma (
int
) – Chunk size.bins (
int
) – The number of bins to discretize data.K – Top-k root causes.
f_node (
str
) – The name of anomaly variable.verbose (
bool
) – True iff verbose output should be printed. Default: False.
- start_alpha: float = 0.01
- alpha_step: float = 0.1
- alpha_limit: float = 1
- localized: bool = True
- gamma: int = 5
- bins: int = 5
- k: int = 3
- f_node: str = 'F-node'
- verbose: bool = False
- ci_test: CIT = 'chisq'
- class pyrca.analyzers.rcd.RCD(config)
Bases:
BaseRCA
The RCD algorithm for Root Cause Analysis. If using this explainer, please cite the original work: Root Cause Analysis of Failures in Microservices through Causal Discovery.
- train(**kwargs)
model training is implemented in find_root_causes function.
- find_root_causes(normal_df, abnormal_df, **kwargs)
Finds the root causes given the abnormal dataset.
- Returns
A list of the found root causes.
pyrca.analyzers.ht module
The Hypothesis Testing (HT) algorithm
- class pyrca.analyzers.ht.HTConfig(graph, aggregator='max', root_cause_top_k=3)
Bases:
BaseConfig
The configuration class of the HT method for Root Cause Analysis
- Parameters
graph (
Union
[DataFrame
,str
]) – The adjacency matrix of the causal graphs, which can be a pandas dataframe or a file path of a CSV file or a pickled file.aggregator (
str
) – The function for aggregating the node score from all the abnormal data.root_cause_top_k (
int
) – The maximum number of root causes in the results.
- graph: Union[DataFrame, str]
- aggregator: str = 'max'
- root_cause_top_k: int = 3
- class pyrca.analyzers.ht.HT(config)
Bases:
BaseRCA
Regression-based Hypothesis Testing method for Root Cause Analysis. If using this explainer, please cite the original work: Causal Inference-Based Root Cause Analysis for Online Service Systems with Intervention Recognition.
- train(normal_df, **kwargs)
Train regression model for each node based on its parents. Build the score functions.
- Parameters
normal_df (
DataFrame
) – A pandas dataFrame of normal data.
- find_root_causes(abnormal_df, anomalous_metrics=None, adjustment=False, **kwargs)
Finds the root causes given the abnormal dataset.
- Parameters
abnormal_df (
DataFrame
) – A pandas dataFrame of abnormal data.anomalous_metrics (
Optional
[str
]) – The name of detected anomalous metrics, it is used to print the path from root nodes.adjustment (
bool
) – Whether to perform descendant adjustment.
- Returns
A list of the found root causes.