Root Cause Detector module

This module contains the two application APIs of causal discovery algorithms.

  • RootCauseDetection: API for root cause detection.

  • TabularDistributionShiftDetector: API for root cause detection in tabular data.

causalai.application.root_cause_detection

RootCauseDetector detects root cause of anomaly in continous time series data with the help of a higher-level context variable. The algorithm uses the PC algorithm to estimate the causal graph, for root cause analysis by treating the failure, represented using the higher-level metrics, as an intervention on the root cause node, and PC can use conditional independence tests to quickly detect which lower-level metric the failure node points to, as the root cause of anomaly.

This algorithm makes the following assumptions: 1. observational samples conditioned on the higher-level context variable (e.g., time index) are i.i.d. 2. linear relationship between variables with Gaussian noise terms, 3. Causal Markov condition, which implies that two variables that are d-separated in a causal graph are probabilistically independent 4. faithfulness, i.e., no conditional independence can hold unless the Causal Markov condition is met, 5. no hidden confounders.

class causalai.application.root_cause_detection.RootCauseDetector(data_obj: TabularData, var_names: List[str], time_metric_name: str = 'time', prior_knowledge: PriorKnowledge | None = None)

Detects root cause of distribution shift in time series data.

Reference: Ikram, Azam, et al. "Root Cause Analysis of Failures in Microservices through Causal Discovery." Advances in Neural Information Processing Systems 35 (2022): 31158-31170.

Reference: Huang, Biwei, et al. "Causal discovery from heterogeneous/nonstationary data." The Journal of Machine Learning Research 21.1 (2020): 3482-3534.

__init__(data_obj: TabularData, var_names: List[str], time_metric_name: str = 'time', prior_knowledge: PriorKnowledge | None = None)

PC algorithm for root cause detection in time-varying data settings. :param data_obj: pre-processed TabularData object :type data_obj: TabularData :param var_names: list of variable names :type var_names: List[str] :param time_metric_name: name of the metric that represents time-varying context (e.g. time index) :type time_metric_name: str

Defaults to the name 'time'.

Parameters:

prior_knowledge (Optional[PriorKnowledge]) -- prior knowledge about the causal graph

run(pvalue_thres: float = 0.05, max_condition_set_size: int = 4, return_graph: bool = False)

Run the PC algorithm for root cause detection in microservice metrics. :param pvalue_thres: p-value threshold for conditional independence test :type pvalue_thres: float

Defaults to 0.05.

Parameters:
  • max_condition_set_size (int Defaults to 4.) -- maximum size of the condition set

  • return_graph (bool Defaults to False.) -- whether to return the estimated causal graph

Returns:

root cause of the incident and/or the estimated causal graph

Return type:

Union[List[str], Dict[str, List[str]]]