TimeSeries Base module
causalai.models.time_series.base
- class causalai.models.time_series.base.BaseTimeSeriesAlgo(data: TimeSeriesData, prior_knowledge: PriorKnowledge | None = None, use_multiprocessing=False, **kargs)
- __init__(data: TimeSeriesData, prior_knowledge: PriorKnowledge | None = None, use_multiprocessing=False, **kargs)
- Parameters:
data (TimeSeriesData object) -- this is a TimeSeriesData object and contains attributes likes data.data_arrays, which is a list of numpy array of shape (observations N, variables D).
prior_knowledge (PriorKnowledge object) -- Specify prior knoweledge to the causal discovery process by either forbidding links that are known to not exist, or adding back links that do exist based on expert knowledge. See the PriorKnowledge class for more details.
- get_all_parents(target_var: int | str, max_lag: int) List[Tuple]
Populates the list using all nodes within time lag: -max_lag to -1
- get_candidate_parents(target_var: int | str, max_lag: int) List[Tuple]
Populates the list using all nodes within time lag: -max_lag to -1 if prior_knowledge allows the link
- get_parents(pvalue_thres: float = 0.05) Tuple[Tuple[int | str, int]]
Assuming run() function has been called for a target_var, get_parents function returns the list of lagged parent names that cause the target_var under the given pvalue_thres.
- Parameters:
pvalue_thres (float) -- Significance level used for hypothesis testing (default: 0.05). Candidate parents with pvalues above pvalue_thres are ignored, and the rest are returned as the cause of the target_var.
- Returns:
List of estimated parents of the form [(<var5_name>, -1), (<var2_name>, -3), ...].
- Return type:
list
- abstract run(target_var: int | str, pvalue_thres: float = 0.05, max_lag: int = 1) ResultInfoTimeseriesSingle
Run causal discovery using the algorithm implemented here
- Parameters:
target_var (int) -- Target variable index or name for which lagged parents need to be estimated.
pvalue_thres (float) -- Significance level used for hypothesis testing (default: 0.05). Candidate parents with pvalues above pvalue_thres are ignored, and the rest are returned as the cause of the target_var.
max_lag (int, optional) -- Maximum time lag (default: 1). Must be larger or equal to 1.
- Returns:
Dictionay has three keys:
parents : List of estimated parents.
value_dict : Dictionary of form {(var3_name, -1):float, ...} containing the test statistic of a link.
pvalue_dict : Dictionary of form {(var3_name, -1):float, ...} containing the
p-value corresponding to the above test statistic.
- Return type:
dict
- sort_parents(parents_vals: Dict) Tuple[Tuple[int | str, int]]
Sort (in descending order) parents according to test statistic values.
- Parameters:
parents_vals (dict) -- Dictionary of form {(<var_name>, <lag>):float, ...} containing the test statistic value of each causal link.
- Returns:
parents : of the form List of form [(<var5_name>, -1), (<var2_name>, -3), ...] containing parents sorted by their statistic.
- Return type:
list
- class causalai.models.time_series.base.BaseTimeSeriesAlgoFull(**kargs)
- __init__(**kargs)
- get_parents(pvalue_thres: float = 0.05, target_var: int | str | None = None) Dict[int | str, Tuple[Tuple[int | str, int]]]
Assuming run() function has been called, get_parents function returns a dictionary. The keys of this dictionary are the variable names, and the corresponding values are the list of lagged parent names that cause the target variable under the given pvalue_thres.
- Parameters:
pvalue_thres (float) -- This pvalue_thres is the significance level used for hypothesis testing (default: 0.05).
target_var (str or float, optional) -- If specified (must be one of the data variable names), the parents of only this variable are returned as a list, otherwise a dictionary is returned where each key is a target variable name, and the corresponding values is the list of its parents.
- Returns:
Dictionay has D keys, where D is the number of variables. The value corresponding each key is the list of lagged parent names that cause the target variable under the given pvalue_thres.
- Return type:
dict
- abstract run(pvalue_thres: float = 0.05, max_lag: int = 1) Dict[int | str, ResultInfoTimeseriesFull]
Run causal discovery using the algorithm implemented here for estimating the causal stength of all potential lagged parents of all the variables.
- Parameters:
pvalue_thres (float) -- Significance level used for hypothesis testing (default: 0.05). Candidate parents with pvalues above pvalue_thres are ignored, and the rest are returned as the cause of the target_var.
max_lag (int, optional) -- Maximum time lag (default: 1). Must be larger or equal to 1.
- Returns:
Dictionay has D keys, where D is the number of variables. The value corresponding each key is the dictionary output of BaseTimeSeriesAlgo.run.
- Return type:
dict