Granger module

causalai.models.time_series.granger

Granger causality can be used for causal discovery in time series data without contemporaneous causal connections. The intuition behind Granger causality is that for two time series random variables X and Y, if including the past values of X to predict Y improves the prediction performance, over using only the past values of Y, then X causes Y. In practice, to find the causal parents of a variable, this algorithm involves performing linear regression to predict that variable using the remaining variables, and using the regression coefficients to determine the causality.

Granger causality assumes: 1. linear relationship between variables, 2. covariance stationary, i.e., a temporal sequence of random variables all have the same mean and the covariance between the random variables at any two time steps depends only on their relative positions, and 3. no hidden confounders.

Note that the Granger algorithm only supports lagged causal relationship discovery, i.e., no instantaneous causal relationships.

class causalai.models.time_series.granger.GrangerSingle(data: TimeSeriesData, prior_knowledge: PriorKnowledge | None = None, max_iter: int = 1000, cv: int = 5, use_multiprocessing: bool | None = False)
__init__(data: TimeSeriesData, prior_knowledge: PriorKnowledge | None = None, max_iter: int = 1000, cv: int = 5, use_multiprocessing: bool | None = False)

Granger causality algorithm for estimating lagged parents of single variable.

Parameters:
  • data (TimeSeriesData object) -- this is a TimeSeriesData object and contains attributes likes data.data_arrays, which is a list of numpy array of shape (observations N, variables D).

  • prior_knowledge (PriorKnowledge object) -- Specify prior knoweledge to the causal discovery process by either forbidding links that are known to not exist, or adding back links that do exist based on expert knowledge. See the PriorKnowledge class for more details.

  • max_iter (int) -- max_iters to update the LassoCV least squares optimization (default=1000). See https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoCV.html.

  • cv (int) -- cross-validation generator or iterable (default=5). See https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoCV.html.

  • use_multiprocessing (bool) -- If True, computations are performed using multi-processing which makes the algorithm faster.

run(target_var: int | str, pvalue_thres: float = 0.05, max_lag: int = 1, full_cd: bool = False) ResultInfoTimeseriesSingle

Runs Granger causality algorithm for estimating the causal stength of all potential lagged parents of a single variable.

Parameters:
  • target_var (int or str) -- Target variable index or name for which lagged parents need to be estimated.

  • pvalue_thres (float) -- Significance level used for hypothesis testing (default: 0.05). Candidate parents with pvalues above pvalue_thres are ignored, and the rest are returned as the cause of the target_var.

  • max_lag (int, optional) -- Maximum time lag. Must be larger or equal to 1 (default: 1).

  • full_cd (bool) -- This variable is only meant for internal use to handle multiprocessing if set to True (default: False).

Returns:

Dictionay has three keys:

  • parents : List of estimated parents.

  • value_dict : Dictionary of form {(var3_name, -1):float, ...} containing the test statistic of a link.

  • pvalue_dict : Dictionary of form {(var3_name, -1):float, ...} containing the

p-value corresponding to the above test statistic.

Return type:

dict

class causalai.models.time_series.granger.Granger(data: TimeSeriesData, prior_knowledge: PriorKnowledge | None = None, max_iter: int = 1000, cv: int = 5, use_multiprocessing: bool | None = False, **kargs)

Granger algorithm for estimating lagged parents of all variables.

__init__(data: TimeSeriesData, prior_knowledge: PriorKnowledge | None = None, max_iter: int = 1000, cv: int = 5, use_multiprocessing: bool | None = False, **kargs)

Granger causality algorithm for estimating lagged parents of all variables.

Parameters:
  • data (TimeSeriesData object) -- this is a TimeSeriesData object and contains attributes likes data.data_arrays, which is a list of numpy array of shape (observations N, variables D).

  • prior_knowledge (PriorKnowledge object) -- Specify prior knoweledge to the causal discovery process by either forbidding links that are known to not exist, or adding back links that do exist based on expert knowledge. See the PriorKnowledge class for more details.

  • max_iter (int) -- max_iters to update the LassoCV least squares optimization (default=1000). See https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoCV.html.

  • cv (int) -- cross-validation generator or iterable (default=5). See https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoCV.html.

  • use_multiprocessing (bool) -- If True, computations are performed using multi-processing which makes the algorithm faster.

get_parents(pvalue_thres: float = 0.05, target_var: int | str | None = None) Dict[int | str, Tuple[Tuple[int | str, int]]]

Assuming run() function has been called, get_parents function returns a dictionary. The keys of this dictionary are the variable names, and the corresponding values are the list of lagged parent names that cause the target variable under the given pvalue_thres.

Parameters:
  • pvalue_thres (float) -- This pvalue_thres is the significance level used for hypothesis testing (default: 0.05).

  • target_var (str or int, optional) -- If specified (must be one of the data variable names), the parents of only this variable are returned as a list, otherwise a dictionary is returned where each key is a target variable name, and the corresponding values is the list of its parents.

Returns:

Dictionay has D keys, where D is the number of variables. The value corresponding each key is the list of lagged parent names that cause the target variable under the given pvalue_thres.

Return type:

dict

run(pvalue_thres: float = 0.05, max_lag: int = 1) Dict[int | str, ResultInfoTimeseriesFull]

Runs Granger causality algorithm for estimating the causal stength of all potential lagged parents of all the variables.

Parameters:
  • pvalue_thres (float) -- This pvalue_thres is the significance level used for hypothesis testing (default: 0.05).

  • max_lag (int, optional) -- Maximum time lag. Must be larger or equal to 1 (default: 1).

Returns:

Dictionay has D keys, where D is the number of variables. The value corresponding each key is the dictionary output of GrangerSingle.run.

Return type:

dict