LINGAM Tabular module

causalai.models.tabular.lingam

LINGAM can be used for causal discovery in tabular data. The algorithm works by first performing independent component analysis (ICA) on the observational data matrix X (#variables x #samples) to extract the mixing matrix A over the independent components (noise matrix) E (same size as X), i.e. solving X=AE. Then their algorithm uses the insight that to find the causal order, each sample x can be decomposed as, x = Bx + e, where B is a lower triangular matrix and e are the independent noise samples. Noticing that B = (I - A^-1), we solve for B, and find the permutation matrix P, such that PBP' is as close to a lower triangular matrix as possible.

This algorithm makes the following assumptions: 1. linear relationship between variables, 2. non-Gaussianity of the error (regression residuals), 3. The causal graph is a DAG, 4. no hidden confounders. We do not support multi-processing for this algorithm.

class causalai.models.tabular.lingam.LINGAM(data: TabularData, prior_knowledge: PriorKnowledge | None = None, use_multiprocessing: bool | None = False, **kargs)

LiNGAM algorithm exploits the additive non-Gaussian residual in linear causal graphs for causal discovery on multivariate tabular data.

References: [1] Shimizu, Shohei, Patrik O. Hoyer, Aapo Hyvärinen, Antti Kerminen, and Michael Jordan. "A linear non-Gaussian acyclic model for causal discovery." Journal of Machine Learning Research 7, no. 10 (2006).

__init__(data: TabularData, prior_knowledge: PriorKnowledge | None = None, use_multiprocessing: bool | None = False, **kargs)

LiNGAM algorithm wrapper.

Parameters:
  • data (TabularData object) -- this is a TabularData object and contains attributes likes data.data_arrays, which is a list of numpy array of shape (observations N, variables D).

  • prior_knowledge (PriorKnowledge object) -- Specify prior knowledge to the causal discovery process by either forbidding links that are known to not exist, or adding back links that do exist based on expert knowledge. See the PriorKnowledge class for more details.

  • use_multiprocessing (bool) -- Multi-processing is not supported.

get_parents(pvalue_thres: float = 0.05, target_var: int | str | None = None) Dict[int | str, Tuple[Tuple[int | str, int]]]

Assuming run() function has been called, get_parents function returns a dictionary. The keys of this dictionary are the variable names, and the corresponding values are the list of parent names that cause the target variable under the given pvalue_thres.

Parameters:
  • pvalue_thres (float) -- This pvalue_thres is the significance level used for hypothesis testing (default: 0.05).

  • target_var (str or float, optional) -- If specified (must be one of the data variable names), the parents of only this variable are returned as a list, otherwise a dictionary is returned where each key is a target variable name, and the corresponding values is the list of its parents.

Returns:

Dictionay has D keys, where D is the number of variables. The value corresponding each key is the list of parent names that cause the target variable under the given pvalue_thres.

Return type:

dict

run(pvalue_thres: float = 0.05) Dict[int | str, ResultInfoTabularFull]

Runs LiNGAM algorithm for estimating the causal graph.

Parameters:

pvalue_thres (float) -- This pvalue_thres is the significance level used for hypothesis testing (default: 0.05).

Returns:

Dictionay has D keys, where D is the number of variables. The value corresponding to each key is a dictionary with three keys:

  • parents : List of estimated parents.

  • value_dict : Dictionary of form {var3_name:float, ...} containing the test statistic of a link.

  • pvalue_dict : Dictionary of form {var3_name:float, ...} containing the p-value corresponding to the above test statistic.

Return type:

dict