Tabular Data module
causalai.data.tabular
- class causalai.data.tabular.TabularData(data: ndarray, var_names: List[str] | None = None, contains_nans: bool = False)
Data object containing tabular array.
- __init__(data: ndarray, var_names: List[str] | None = None, contains_nans: bool = False)
- Parameters:
data (ndarray) -- data is a Numpy array of shape (observations N, variables D).
var_names (list) -- Names of variables. If None, range(N) is used.
contains_nans (bool) -- If true, NaNs will be handled automatically during causal discovery. Note that checking for NaNs makes the code a little slower. So set to true only if needed.
- extract_array(X: int | str, Y: int | str, Z: List[int | str]) List[ndarray]
Extract the arrays corresponding to the node names X,Y,Z from self.data_arrays (see BaseData). X and Y are individual nodes, and Z is the set of nodes to be used as the conditional set.
- Parameters:
X (int or str) -- X is the target variable index/name. Eg. 3 or <var_name>, if a variable name was specified when creating the data object.
Y (int or str) -- Y specifies a variable. Eg. 2 or <var_name>, if a variable name was specified when creating the data object.
Z (list of str or int) -- Z is a list of str or int, where each element has the form 2 or <var_name>, if a variable name was specified when creating the data object.
- Returns:
x_array, y_array, z_array : Tuple of data arrays. All have 0th dimension equal to the number of observarions. z_array.shape[1] has dimensions equal to the number of nodes specified in Z.
- Return type:
tuple of ndarray
- get_causal_Xy(target_var: int | str, parents: Tuple[int | str]) Tuple[ndarray, ndarray, List[int | str]]
Given target_var name, and the list of parents corresponding to target_var, this method extracts the data tuple of the form (X,y), where y is a 1D ndarray containing the observations corresponding to target_var as targets, and X is a 2D ndarray (num_observations, num_vars) where each row contains the variables in data that correspond to the parents of target_var. This pair (X,y) can be useful (for instance) for learning machine learning models where X will be the input and y target.
- Parameters:
target_var (int) -- Target variable index or name.
parents (list) -- List of estimated parents of the form [<var5_name>, <var2_name>, ...].
- Returns:
X,y, column_names. X,y are as described above, and column_names is a list of names of the columns in X.
- Return type:
tuple(ndarray, ndarray, List)
- get_causal_Xy_i(i: int, arr_idx: int, target_var: int | str, parents: Tuple[int | str]) Tuple[ndarray, ndarray, List[int | str]]
Given target_var name, and the list of parents corresponding to target_var, this method extracts the data tuple of the form (X,y), where y is a 1 scalar containing the observation corresponding to target_var at index i as targets, and X is a 1D ndarray (1, num_vars) where the row contains the variables in data that correspond to the parents of target_var. This pair (X,y) can be useful (for instance) for prediction in machine learning models where X will be the input and y target.
- Parameters:
i (int) -- row index of the data_array for which the target observation and its corresponding input needs to be extracted
arr_idx (int) -- index of the array in self.data_arrays
target_var (int) -- Target variable index or name.
parents (list) -- List of estimated parents of the form [<var5_name>, <var2_name>, ...].
- Returns:
X,y, column_names. X,y are as described above, and column_names is a list of names of the columns in X.
- Return type:
tuple(ndarray, ndarray, List)
- sanity_check(X: List[Tuple], Y: List | List[Tuple], Z: List[Tuple], total_num_nodes: int) None
Perform the following checks:
The variable indices are between 0-D-1
There are no duplicate entries
Time lags are negative
Tuples have length 2 (index and time lag)
- Parameters:
X -- list
Y -- list
Z -- list
total_num_nodes (int) -- total number of nodes
- to_var_index(*args) List[List[int | str]] | List[int | str]
Convert variable names from string to variable index if the name is specified as a string.