TimeSeries Data module

causalai.data.time_series

class causalai.data.time_series.TimeSeriesData(*data: List[ndarray], var_names: List[str] | None = None, contains_nans: bool = False)

Data object containing time series array.

__init__(*data: List[ndarray], var_names: List[str] | None = None, contains_nans: bool = False)
Parameters:
  • data (list of ndarray) -- Each ndarray is a Numpy array of shape (observations N, variables D). This allows the user to pass multiple disjoint time series (e.g. first series is data from Jan-March, while the second series is from July to September).

  • var_names (list) -- Names of variables. If None, range(N) is used.

  • contains_nans (bool) -- If true, NaNs will be handled automatically during causal discovery. Note that checking for NaNs makes the code a little slower. So set to true only if needed.

extract_array(X: int | str, Y: Tuple[int | str, int], Z: List[Tuple], max_lag: int) List[ndarray]

Extract the arrays corresponding to the node names X,Y,Z from self.data_arrays (see BaseData). X and Y are individual nodes, and Z is the set of nodes to be used as the conditional set.

Parameters:
  • X (int or str) -- X is the target variable index/name at the current time step. Eg. 3 or <var_name>, if a variable name was specified when creating the data object.

  • Y (tuple) -- Y specifies a variable at a specific time lag. Eg. (2,-1) or (<var_name>, -1), if a variable name was specified when creating the data object. Here the time lag -1 implies it is 1 time step before X. The time lag must be negative. This is because: 1. a parent of Y cannot be at a future time step relative to Y. 2. We do not support instantaneous causal links. Y can also be None.

  • Z (list of tuples) -- Z is a list of tuples, where each tuple has the form (2,-1) or (<var_name>, -1), if a variable name was specified when creating the data object. The time lags must be negative (same reason as that specified for Y above).

  • max_lag (int) -- Maximum time lag from current time step specifying the Markov blanket lies within this interval.

Returns:

x_array, y_array, z_array : Tuple of data arrays. All have 0th dimension equal to the total length of time series. z_array.shape[1] has dimensions equal to the number of nodes specified in Z.

Return type:

tuple of ndarray

get_causal_Xy(target_var: int | str, parents: Tuple[Tuple[int | str, int]]) Tuple[ndarray, ndarray, List[int | str]]

Given target_var name, and the list of parents corresponding to target_var, this method extracts the data tuple of the form (X,y), where y is a 1D ndarray containing the observations corresponding to target_var as targets, and X is a 2D ndarray (num_observations, num_vars) where each row contains the variables in data that correspond to the parents of target_var. This pair (X,y) can be useful (for instance) for learning machine learning models where X will be the input and y target.

Parameters:
  • target_var (int) -- Target variable index or name.

  • parents (list) -- List of estimated parents of the form [(<var5_name>, -1), (<var2_name>, -3), ...].

Returns:

X,y, column_names. X,y are as described above, and column_names is a list of names of the columns in X.

Return type:

tuple(ndarray, ndarray, List)

get_causal_Xy_i(i: int, arr_idx: int, target_var: int | str, parents: Tuple[Tuple[int | str, int]]) Tuple[ndarray, ndarray, List[int | str]]

Given a time series data object, target_var name, and the list of parents corresponding to target_var, this method extracts the data tuple of the form (X,y), where y is a scalar containing the observation corresponding to target_var at index i as targets, and X is a 1D ndarray where the row contains the variables in data that correspond to the parents of target_var. This pair (X,y) can be useful (for instance) for prediction in machine learning models where X will be the input and y target.

Parameters:
  • i (int) -- row index of the data_array for which the target observation and its corresponding input needs to be extracted

  • arr_idx (int) -- index of the array in self.data_arrays

  • data (TimeSeriesData object) -- It contains the list data.data_arrays, where each item is a numpy array of shape (observations N, variables D).

  • target_var (int) -- Target variable index or name.

  • parents (list) -- List of estimated parents of the form [(<var5_name>, -1), (<var2_name>, -3), ...].

Returns:

X,y as described above.

Return type:

tuple(ndarray, ndarray)

sanity_check(X: List[Tuple], Y: List | List[Tuple], Z: List[Tuple], total_num_nodes: int) None

Perform the following checks:

  • The variable indices are between 0-D-1

  • There are no duplicate entries

  • Time lags are negative

  • Tuples have length 2 (index and time lag)

Parameters:
  • X -- list

  • Y -- list

  • Z -- list

  • total_num_nodes (int) -- total number of nodes

to_var_index(*args) List[List[Tuple[int | str, int]]] | List[Tuple[int | str, int]]

Convert variable names from string to variable index if the name is specified as a string.