Tabular Data module

causalai.data.tabular

class causalai.data.tabular.TabularData(data: ndarray, var_names: List[str] | None = None, contains_nans: bool = False)

Data object containing tabular array.

__init__(data: ndarray, var_names: List[str] | None = None, contains_nans: bool = False)
Parameters:
  • data (ndarray) -- data is a Numpy array of shape (observations N, variables D).

  • var_names (list) -- Names of variables. If None, range(N) is used.

  • contains_nans (bool) -- If true, NaNs will be handled automatically during causal discovery. Note that checking for NaNs makes the code a little slower. So set to true only if needed.

extract_array(X: int | str, Y: int | str, Z: List[int | str]) List[ndarray]

Extract the arrays corresponding to the node names X,Y,Z from self.data_arrays (see BaseData). X and Y are individual nodes, and Z is the set of nodes to be used as the conditional set.

Parameters:
  • X (int or str) -- X is the target variable index/name. Eg. 3 or <var_name>, if a variable name was specified when creating the data object.

  • Y (int or str) -- Y specifies a variable. Eg. 2 or <var_name>, if a variable name was specified when creating the data object.

  • Z (list of str or int) -- Z is a list of str or int, where each element has the form 2 or <var_name>, if a variable name was specified when creating the data object.

Returns:

x_array, y_array, z_array : Tuple of data arrays. All have 0th dimension equal to the number of observarions. z_array.shape[1] has dimensions equal to the number of nodes specified in Z.

Return type:

tuple of ndarray

get_causal_Xy(target_var: int | str, parents: Tuple[int | str]) Tuple[ndarray, ndarray, List[int | str]]

Given target_var name, and the list of parents corresponding to target_var, this method extracts the data tuple of the form (X,y), where y is a 1D ndarray containing the observations corresponding to target_var as targets, and X is a 2D ndarray (num_observations, num_vars) where each row contains the variables in data that correspond to the parents of target_var. This pair (X,y) can be useful (for instance) for learning machine learning models where X will be the input and y target.

Parameters:
  • target_var (int) -- Target variable index or name.

  • parents (list) -- List of estimated parents of the form [<var5_name>, <var2_name>, ...].

Returns:

X,y, column_names. X,y are as described above, and column_names is a list of names of the columns in X.

Return type:

tuple(ndarray, ndarray, List)

get_causal_Xy_i(i: int, arr_idx: int, target_var: int | str, parents: Tuple[int | str]) Tuple[ndarray, ndarray, List[int | str]]

Given target_var name, and the list of parents corresponding to target_var, this method extracts the data tuple of the form (X,y), where y is a 1 scalar containing the observation corresponding to target_var at index i as targets, and X is a 1D ndarray (1, num_vars) where the row contains the variables in data that correspond to the parents of target_var. This pair (X,y) can be useful (for instance) for prediction in machine learning models where X will be the input and y target.

Parameters:
  • i (int) -- row index of the data_array for which the target observation and its corresponding input needs to be extracted

  • arr_idx (int) -- index of the array in self.data_arrays

  • target_var (int) -- Target variable index or name.

  • parents (list) -- List of estimated parents of the form [<var5_name>, <var2_name>, ...].

Returns:

X,y, column_names. X,y are as described above, and column_names is a list of names of the columns in X.

Return type:

tuple(ndarray, ndarray, List)

sanity_check(X: List[Tuple], Y: List | List[Tuple], Z: List[Tuple], total_num_nodes: int) None

Perform the following checks:

  • The variable indices are between 0-D-1

  • There are no duplicate entries

  • Time lags are negative

  • Tuples have length 2 (index and time lag)

Parameters:
  • X -- list

  • Y -- list

  • Z -- list

  • total_num_nodes (int) -- total number of nodes

to_var_index(*args) List[List[int | str]] | List[int | str]

Convert variable names from string to variable index if the name is specified as a string.