Tabular Distribution Shift Detector module
This module contains the two application APIs of causal discovery algorithms.
RootCauseDetection: API for root cause detection.
TabularDistributionShiftDetector: API for root cause detection in tabular data.
causalai.application.distribution_shift_detection
TabularDistributionShiftDetector detects the origins of distribution shifts in tabular, continous/discrete data with the help of domain index variable. The algorithm uses the PC algorithm to estimate the causal graph, by treating distribution shifts as intervention of the domain index on the root cause node, and PC can use conditional independence tests to quickly recover the causal graph and detec the root cause of anomaly. Note that the algorithm supports both discrete and continuous variables, and can handle nonlinear relationships by converting the continous variables into discrete ones using K-means clustering and using discrete PC algorithm instead for CI test and causal discovery.
This algorithm makes the following assumptions: 1. observational samples conditioned on the domain index are i.i.d. 2. arbitrary relationship between variables, 3. Causal Markov condition, which implies that two variables that are d-separated in a causal graph are probabilistically independent 4. faithfulness, i.e., no conditional independence can hold unless the Causal Markov condition is met, 5. no hidden confounders.
- class causalai.application.distribution_shift_detection.TabularDistributionShiftDetector(data_obj: TabularData, var_names: List[str], domain_index_name: str = 'domain_index', prior_knowledge: PriorKnowledge | None = None)
Detects the root causes of distribution shift in tabular data.
Reference: Ikram, Azam, et al. "Root Cause Analysis of Failures in Microservices through Causal Discovery." Advances in Neural Information Processing Systems 35 (2022): 31158-31170.
- __init__(data_obj: TabularData, var_names: List[str], domain_index_name: str = 'domain_index', prior_knowledge: PriorKnowledge | None = None)
PC algorithm for root cause detection in domain-varying data settings. :param data_obj: tabular data object :type data_obj: TabularData :param var_names: list of variable names :type var_names: List[str] :param domain_index_name: name of the domain index column :tyoe domain_index_name: str :param prior_knowledge: prior knowledge about the causal graph :type prior_knowledge: Optional[PriorKnowledge]
- run(pvalue_thres: float = 0.01, max_condition_set_size: int = 4, return_graph: bool = False)
Run the algorithm for root cause detection in tabular data. :param pvalue_thres: p-value threshold for conditional independence test :type pvalue_thres: float :param max_condition_set_size: maximum size of the condition set :type max_condition_set_size: int :return_graph: whether to return the estimated causal graph :type return_graph: bool :return: root cause of the incident and/or the estimated causal graph :rtype: Union[List[str], Dict[str, List[str]]]