Root Cause Detector module
This module contains the two application APIs of causal discovery algorithms.
RootCauseDetection: API for root cause detection.
TabularDistributionShiftDetector: API for root cause detection in tabular data.
causalai.application.root_cause_detection
RootCauseDetector detects root cause of anomaly in continous time series data with the help of a higher-level context variable. The algorithm uses the PC algorithm to estimate the causal graph, for root cause analysis by treating the failure, represented using the higher-level metrics, as an intervention on the root cause node, and PC can use conditional independence tests to quickly detect which lower-level metric the failure node points to, as the root cause of anomaly.
This algorithm makes the following assumptions: 1. observational samples conditioned on the higher-level context variable (e.g., time index) are i.i.d. 2. linear relationship between variables with Gaussian noise terms, 3. Causal Markov condition, which implies that two variables that are d-separated in a causal graph are probabilistically independent 4. faithfulness, i.e., no conditional independence can hold unless the Causal Markov condition is met, 5. no hidden confounders.
- class causalai.application.root_cause_detection.RootCauseDetector(data_obj: TabularData, var_names: List[str], time_metric_name: str = 'time', prior_knowledge: PriorKnowledge | None = None)
Detects root cause of distribution shift in time series data.
Reference: Ikram, Azam, et al. "Root Cause Analysis of Failures in Microservices through Causal Discovery." Advances in Neural Information Processing Systems 35 (2022): 31158-31170.
Reference: Huang, Biwei, et al. "Causal discovery from heterogeneous/nonstationary data." The Journal of Machine Learning Research 21.1 (2020): 3482-3534.
- __init__(data_obj: TabularData, var_names: List[str], time_metric_name: str = 'time', prior_knowledge: PriorKnowledge | None = None)
PC algorithm for root cause detection in time-varying data settings. :param data_obj: pre-processed TabularData object :type data_obj: TabularData :param var_names: list of variable names :type var_names: List[str] :param time_metric_name: name of the metric that represents time-varying context (e.g. time index) :type time_metric_name: str
Defaults to the name 'time'.
- Parameters:
prior_knowledge (Optional[PriorKnowledge]) -- prior knowledge about the causal graph
- run(pvalue_thres: float = 0.05, max_condition_set_size: int = 4, return_graph: bool = False)
Run the PC algorithm for root cause detection in microservice metrics. :param pvalue_thres: p-value threshold for conditional independence test :type pvalue_thres: float
Defaults to 0.05.
- Parameters:
max_condition_set_size (int Defaults to 4.) -- maximum size of the condition set
return_graph (bool Defaults to False.) -- whether to return the estimated causal graph
- Returns:
root cause of the incident and/or the estimated causal graph
- Return type:
Union[List[str], Dict[str, List[str]]]