VARLINGAM module

causalai.models.time_series.var_lingam

VARLINGAM can be used for causal discovery in time series data with contemporaneous causal connections. This algorithm can be broadly divided into two steps. First, we estimate the time lagged causal effects using vector autoregression. Second, we estimate the instantaneous causal effects by applying the LiNGAM algorithm on the residuals of the previous step, where LiNGAM exploits the non-Gaussianity of the residuals to estimate the instantaneous variables' causal order.

This algorithm makes the following assumptions: 1. linear relationship between variables, 2. non-Gaussianity of the error (regression residuals), 3. no cycles among contemporaneous causal relations, and 4. no hidden confounders. We do not support multi-processing for this algorithm.

class causalai.models.time_series.var_lingam.VARLINGAM(data: TimeSeriesData, use_multiprocessing: bool | None = False, **kargs)

VAR-LiNGAM algorithm which combines non-Gaussian instantanenous model with autoregressive model for causal discovery on multivariate time series data

References: [1] Aapo Hyvärinen, Kun Zhang, Shohei Shimizu, Patrik O. Hoyer. Estimation of a Structural Vector Autoregression Model Using Non-Gaussianity. Journal of Machine Learning Research, 11: 1709-1731, 2010.

__init__(data: TimeSeriesData, use_multiprocessing: bool | None = False, **kargs)

VAR-LiNGAM algorithm wrapper.

Parameters:
  • data (TimeSeriesData object) -- this is a TimeSeriesData object and contains attributes likes data.data_arrays, which is a list of numpy array of shape (observations N, variables D).

  • use_multiprocessing (bool) -- Multi-processing is not supported.

get_parents(pvalue_thres: float = 0.05, target_var: int | str | None = None) Dict[int | str, Tuple[Tuple[int | str, int]]]

Assuming run() function has been called, get_parents function returns a dictionary. The keys of this dictionary are the variable names, and the corresponding values are the list of lagged parent names that cause the target variable under the given pvalue_thres.

Parameters:
  • pvalue_thres (float) -- This pvalue_thres is the significance level used for hypothesis testing (default: 0.05).

  • target_var (str or float, optional) -- If specified (must be one of the data variable names), the parents of only this variable are returned as a list, otherwise a dictionary is returned where each key is a target variable name, and the corresponding values is the list of its parents.

Returns:

Dictionay has D keys, where D is the number of variables. The value corresponding each key is the list of parent names that cause the target variable under the given pvalue_thres.

Return type:

dict

run(pvalue_thres: float = 0.05, max_lag: int = 1) Dict[int | str, ResultInfoTimeseriesFull]

Runs VAR-LiNGAM algorithm for estimating the causal stength of all potential time-lagged and instantanenous causal variables.

Parameters:
  • pvalue_thres (float) -- This pvalue_thres is the significance level used for hypothesis testing (default: 0.05).

  • max_lag (int, optional) -- Maximum time lag. Must be larger or equal to 1 (default: 1).

Returns:

Dictionay has D keys, where D is the number of variables. The value corresponding each key is a dictionary with three keys:

  • parents : List of estimated parents.

  • value_dict : Dictionary of form {(var3_name, -1):float, ...} containing the test statistic of a link.

  • pvalue_dict : Dictionary of form {(var3_name, -1):float, ...} containing the p-value corresponding to the above test statistic.

Return type:

dict