GIN Tabular module

causalai.models.tabular.gin

Generalized Independent Noise (GIN) is a method for causal discovery for tabular data when there are hidden confounder variables.

Let X denote the set of all the observed variables and L the set of unknown ground truth hidden variables. Then this algorithm makes the following assumptions: 1. There is no observed variable in X, that is an ancestor of any latent variables in L. 2. The noise terms are non-Gaussian. 3. Each latent variable set L' in L, in which every latent variable directly causes the same set of observed variables, has at least 2Dim(L') pure measurement variables as children. 4. There is no direct edge between observed variables.

class causalai.models.tabular.gin.GIN(data: ~causalai.data.tabular.TabularData, prior_knowledge: ~causalai.models.common.prior_knowledge.PriorKnowledge | None = None, CI_test: ~causalai.models.common.CI_tests.partial_correlation.PartialCorrelation | ~causalai.models.common.CI_tests.kci.KCI | ~causalai.models.common.CI_tests.discrete_ci_tests.DiscreteCI_tests = <causalai.models.common.CI_tests.kci.KCI object>, use_multiprocessing: bool | None = False, **kargs)

Generalized Independent Noise (GIN) is a method for causal discovery for multivariate tabular data when there are hidden confounder variables.

References: [1] Xie, F., Cai, R., Huang, B., Glymour, C., Hao, Z., & Zhang, K. (2020). Generalized independent noise condition for estimating latent variable causal graphs. Advances in neural information processing systems, 33, 14891-14902.

__init__(data: ~causalai.data.tabular.TabularData, prior_knowledge: ~causalai.models.common.prior_knowledge.PriorKnowledge | None = None, CI_test: ~causalai.models.common.CI_tests.partial_correlation.PartialCorrelation | ~causalai.models.common.CI_tests.kci.KCI | ~causalai.models.common.CI_tests.discrete_ci_tests.DiscreteCI_tests = <causalai.models.common.CI_tests.kci.KCI object>, use_multiprocessing: bool | None = False, **kargs)

Generalized Independent Noise (GIN) is a method for causal discovery when there are hidden confounder variables.

Parameters:
  • data (TabularData object) -- this is a TabularData object and contains attributes likes data.data_arrays, which is a list of numpy array of shape (observations N, variables D).

  • prior_knowledge (PriorKnowledge object) -- Prior knowledge is not supported for the GIN algorithm.

  • use_multiprocessing (bool) -- Multi-processing is not supported.

run(pvalue_thres: float = 0.05) Dict[int | str, ResultInfoTabularFull]

Runs GIN algorithm for estimating the causal graph with latent variables.

Parameters:

pvalue_thres (float) -- This pvalue_thres is the significance level used for hypothesis testing (default: 0.05).

Returns:

Dictionay has D keys, where D is the number of variables. The value corresponding to each key is a dictionary with three keys:

  • parents : List of estimated parents.

  • value_dict : Dictionary of form {var3_name:float, ...} containing the test statistic of a link.

  • pvalue_dict : Dictionary of form {var3_name:float, ...} containing the p-value corresponding to the above test statistic.

Return type:

dict