Continuous Tabular Benchmarking module

causalai.benchmark.tabular.continuous

This is the benchmarking module for continuous tabular data. This module supports methods that evaluates causal discovery algorithms against various challenges, such as their sample complexity, variable complexity, etc. Users can use either synthetically generated data, or provide their own data for benchmarking.

The default evaluation metrics supported by this module are Precision, Recall, F1 Score, and Time Taken by the algorithm. There is also an option for users to include their own custom metrics when calling the benchmarking module.

We provide support for a default set of causal discovery algorithms. Users also have the option to include their own algorithm when calling the benchmarking module.

Data:

1. Synthetic data: this module randomly generates both the causal graph (and the corresponding structural equation model) and the data associated with it. This module supports several benchmarking methods which evaluate causasl discovery algorithms on various aspects such as sample complexity, variable complexity, graph sparsity, etc. Depending on what is being evaluated, the corresponding method generates the graphs and data accordingly. Synthetic data evaluation serves two purposes:

compare the performance of each causal discovery algorithm across different values of a variant (e.g. increasing number of sample),

compare the performance of different causal discovery algorithms for any on given value of a variant.

2. User provided data: In this case, since the data is fixed, this module helps evaluate the performance of one or more causal discovery algorithms on user provided data. Since the data is not synthetically generated, in order to compute the evaluation metrics such as Precision/Recall, we need the ground truth causal graph. Therefore, the user provided data accepted by this module must contain this information. Specifically, the data must be a list of tuples, where each tuple contains the triplet (data_array, var_names, graph_gt), where data_array is a 2D Numpy data array of shape (samples x variables), var_names is a list of variable names, and graph_gt is the ground truth causal graph in the form of a Python dictionary, where keys are the variable names, and the corresponding values are a list of parent names.

class causalai.benchmark.tabular.continuous.BenchmarkContinuousTabular(algo_dict: Dict | None = None, kargs_dict: Dict | None = None, num_exp: int = 20, custom_metric_dict: Dict | None = {}, **kargs)

Continuous tabular data benchmarking module. This class inherits the methods and variables from BenchmarkTabularBase and BenchmarkContinuousTabularBase, and defines benchmarking methods that evaluates causal discovery algorithms against various challenges, such as their sample complexity, variable complexity, etc.

__init__(algo_dict: Dict | None = None, kargs_dict: Dict | None = None, num_exp: int = 20, custom_metric_dict: Dict | None = {}, **kargs)

Continuous tabular data benchmarking module

Parameters:

algo_dict (Dict) --
A Python dictionary where keys are names of causal discovery algorithms, and values are the unistantiated class objects for the corresponding algorithm. Note that this class must be inherited from the BaseTabularAlgoFull class that can be found in causalai.models.tabular.base. Crucially, this class constructor must take a TabularData object (found in causalai.data.tabular) as input, and should have a run method which performs the causal discovery and returns a Python dictionary. The keys of this dictionary should be of the form:

{
var_name1: {'parents': [par(var_name1)]}, var_name2: {'parents': [par(var_name2)]}

}

where par(.) denotes the parent variable name of the argument variable name.
kargs_dict (Dict) -- A Python dictionary where keys are names of causal discovery algorithms (same as algo_dict), and the corresponding values contain any arguments to be passed to the run method of the class object specified in algo_dict.
num_exp (int) -- The number of independent runs to perform per experiment, each with a different random seed. A different random seed generates a different synthetic graph and data for any given configuration. Note that for use provided data, num_exp is not used.
custom_metric_dict (Dict) -- A Python dictionary for specifying custom metrics in addition to the default evaluation metrics calculated for each experiment (precision, recall, F1 score, and time taken). The keys of this dictionary are the names of the user specified metrics, and the corresponding values are callable functions that take as input (graph_est, graph_gt). Here graph_est and graph_gt are the estimated and ground truth causal graph. These graphs are specified as Python Dictionaries, where keys are the children names, and the corresponding values are lists of parent variable names.

benchmark_graph_density(graph_density_list: ~typing.List[float] = [0.05, 0.1, 0.2, 0.5], num_vars: int = 20, T: int = 1000, fn: ~typing.Callable = <function BenchmarkContinuousTabular.<lambda>>, coef: float = 0.1, noise_fn: ~typing.Callable = <built-in method randn of numpy.random.mtrand.RandomState object>)

Graph density: Benchmark algorithms on synthetic data with different number of samples. The synthetic data for any variable is generated using a structural equation model (SEM) of the form: