GES Tabular module

causalai.models.tabular.ges

Greedy Equivalence Search (GES) heuristically searches the space of causal Bayesian network and returns the model with highest Bayesian score it finds. Specifically, GES starts its search with the empty graph. It then performs a forward search in which edges are added between nodes in order to increase the Bayesian score. This process is repeated until no single edge addition increases the score. Finally, it performs a backward search that removes edges until no single edge removal can increase the score.

This algorithm makes the following assumptions: 1. observational samples are i.i.d. 2. linear relationship between variables with Gaussian noise terms, 3. Causal Markov condition, which implies that two variables that are d-separated in a causal graph are probabilistically independent 4. faithfulness, i.e., no conditional independence can hold unless the Causal Markov condition is met, 5. no hidden confounders. We do not support multi-processing for this algorithm.

class causalai.models.tabular.ges.GES(data: TabularData, prior_knowledge: PriorKnowledge | None = None, use_multiprocessing: bool | None = False, **kargs)

Greedy Equivalence Search (GES) for estimating the causal graph from multivariate tabular data. This class is a wrapper around the GES library: https://github.com/juangamella/ges. library

Reference: Chickering, David Maxwell. "Optimal structure identification with greedy search." Journal of machine learning research 3.Nov (2002): 507-554.

__init__(data: TabularData, prior_knowledge: PriorKnowledge | None = None, use_multiprocessing: bool | None = False, **kargs)

Greedy Equivalence Search (GES) for estimating the causal graph from tabular data.

Parameters:
  • data (TabularData object) -- this is a TabularData object and contains attributes likes data.data_arrays, which is a list of numpy array of shape (observations N, variables D).

  • prior_knowledge (PriorKnowledge object) -- Specify prior knoweledge to the causal discovery process by either forbidding links that are known to not exist, or adding back links that do exist based on expert knowledge. See the PriorKnowledge class for more details.

  • use_multiprocessing (bool) -- Multi-processing is not supported.

run(pvalue_thres: float | None = None, A0: ndarray | None = None, phases: List[str] = ['forward', 'backward', 'turning'], debug: int = 0) Dict[int | str, ResultInfoTabularFull]

Runs GES algorithm for estimating the causal graph.

Parameters:
  • pvalue_thres (float) -- Ignored in this algorithm.

  • A0 (np.array) -- the initial CPDAG on which GES will run, where where A0[i,j] != 0 implies i -> j and A0[i,j] != 0 & A0[j,i] != 0 implies i - j. Defaults to the empty graph.

  • phases (list[str]) -- this controls which phases of the GES procedure are run, and in which order. Defaults to ['forward', 'backward', 'turning'].

  • debug (int, optional) -- if larger than 0, debug are traces printed. Higher values correspond to increased verbosity.

Returns:

Dictionay has D keys, where D is the number of variables. The value corresponding to each key is a dictionary with three keys:

  • parents : List of estimated parents.

  • value_dict : Empty Python dictionary.

  • pvalue_dict : Empty Python dictionary.

Return type:

dict