Base Data module

causalai.data.base

class causalai.data.base.BaseData(*data: List[ndarray], var_names: List[str] | None = None, **kargs)

Data object for tabular or time series array.

__init__(*data: List[ndarray], var_names: List[str] | None = None, **kargs)
Parameters:
  • data (list[ndarray]) -- Each ndarray is a Numpy array of shape (observations N, variables D). In the case of time serie data, allowing multiple ndarray allows the user to pass multiple disjoint time series (e.g. first series is data from Jan-March, while the second series is from July to September).

  • var_names (list) -- Names of variables. If None, range(N) is used.

property dim: int

Returns the number of variables (1st dimension) of the data arrays (which must be the same)

abstract extract_array(X: int, Y: Tuple[int | str, int] | int | str, Z: List[Tuple] | List, max_lag: int | None = None)

Extract the arrays corresponding to the node names X,Y,Z from self.data_arrays (see BaseData). X and Y are individual nodes, and Z is the set of nodes to be used as the conditional set.

Parameters:
  • X (int) -- X is the target variable at the current time step. Eg. 3 or <var_name>, if a variable name was specified when creating the data object.

  • Y (tuple or int or str) -- Y specifies a variable. For tabular data it can be the variable index or name. For time series, it is the variable index/name at a specific time lag. Eg. (2,-1) or (<var_name>, -1), if a variable name was specified when creating the data object. Here the time lag -1 implies it is 1 time step before X. The time lag must be negative. This is because: 1. a parent of Y cannot be at a future time step relative to Y. 2. We do not support instantaneous causal links. Y can also be None.

  • Z (list of tuples or a list) -- For time series, Z is a list of tuples, where each tuple has the form (2,-1) or (<var_name>, -1), if a variable name was specified when creating the data object. The time lags must be negative (same reason as that specified for Y above). For tabular data, Z is a list of either variable indices or variable names.

  • max_lag (int) -- Maximum time lag from current time step specifying the Markov blanket lies within this interval.

Returns:

x_array, y_array, z_array : Tuple of data arrays. All have 0th dimension equal to the length of time series. z_array.shape[1] has dimensions equal to the number of nodes specified in Z.

Return type:

tuple of ndarray

index2var_name(index: int | str)

Convert indices to variable names string

property length: List[int]

Returns the list of length (0th dimensions) of each data array passed to the constructor.

var_name2index(name: int | str)

Convert variable names from strings to indices