Prior Knowledge module

causalai.models.common.prior_knowledge

This class allows adding any prior knoweledge to the causal discovery process by either forbidding links that are known to not exist, or adding back links that do exist based on expert knowledge. This class can be used to specify prior knowledge for both tabular and time series data. Note that for both data types, prior knowledge is specified only in terms of variable names, and no time steps need to be specified in the case of time series.

Parameters:

forbidden_links (dict, optional) -- Dictionary of the form {'var3_name': ['var6_name', 'var2_name',...], 'var2_name': ['var4_name',...]} Each item in the dictionary denotes that the list of variable values cannot be parents of the key variable name. In the above example, the first item specifies that var6_name and var2_name cannot be parents (the cause) of var3_name.
existing_links (dict, optional) -- Dictionary of the form {'var4_name': ['var1_name', 'var2_name',...], 'var6_name': ['var4_name',...]} Each item in the dictionary denotes that the list of variable values must be parents of the key variable name. In the above example, the first item specifies that var1_name and var2_name must be parents (the cause) of var4_name.
root_variables (list, optional) -- List of the form ['var7_name',...] Any variable specified in this list says that it do not have any parents (incoming causal links). Note that this information can alternatively be specified in the forbidden_links argument by listing all the variables in the dataset for the key variable var7_name.
leaf_variables (list, optional) -- List of the form ['var7_name',...] Any variable specified in this list says that it do not have any children (outgoing causal links).
forbidden_co_parents (dict, optional) -- Dictionary of the form {'var3_name': ['var6_name', 'var2_name',...], 'var2_name': ['var4_name',...]} Each item in the dictionary denotes that the list of variable values cannot be co-parents of the key variable name. In the above example, the first item specifies that var6_name and var2_name cannot be co-parents (the cause) of var3_name. If not symmetric, it is expanded to be symmetric. Currently only used for Markov Blanket discovery.
existing_co_parents (dict, optional) -- Dictionary of the form {'var4_name': ['var1_name', 'var2_name',...], 'var6_name': ['var4_name',...]} Each item in the dictionary denotes that the list of variable values must be co-parents of the key variable name. In the above example, the first item specifies that var1_name and var2_name must be co-parents (the cause) of var4_name. If not symmetric, it is expanded to be symmetric. Currently only used for Markov Blanket discovery.
fix_co_parents (bool) -- adds to existing/forbidden co-parents those that are implied by existing/forbidden links and by known leaf nodes. Should usually be kept true except in special cases.
var_names (list) -- Only used if fix_co_parents == True. For instance, var2 is added to forbidden_co_parents[var1] whenever var1 in var_names and var2 is in leaf_nodes.

collect_children(target_var: int | str, type: str = 'included')

Returns a list of nodes that must be included/excluded as children of target_var from the graph ('transposes' the existing_links or forbidden_links dictionaries for the target variable).

Parameters:

target_var (int or str) -- target variable name
type -- 'included' for variables required to be children of target_var, 'excluded' for variables required not to.

isValid(parent: int | str, child: int | str) → bool

Checks whether a pair of nodes specified as parent-child is valid under the given prior knowledge. Does not take co-parents into consideration.

Parameters:

parent (int or str) -- Parent variable name
child (int or str) -- Child variable name

Returns:

True or False

Return type:

bool

isValid_co_parent(first_co_parent: int | str, second_co_parent: int | str) → bool

Checks whether a pair of nodes specified as co-parents is valid under the given prior knowledge.

Parameters:

first_co_parent (int or str) -- First co-parent variable name
second_co_parent (int or str) -- Second co-parent variable name

Returns:

True or False

Return type:

bool

post_process_tabular(graph)

Given a causal graph dictionary for tabular data, where keys are children and values are parents, this method removes and adds edges depending on the specified prior knowledge in case there are conflicts between the graph and the prior knowledge. Does not take co-parents into consideration.

Parameters:: graph (dict) -- causal graph dictionary as explained above
Returns:: causal graph dictionary with prior knowledge enforced
Return type:: dict

required(target_var: int | str, type: str = 'included')

Returns a list of variables that must be included/excluded in/from the markov blanket of target_var.

Parameters:

target_var (int or str) -- target variable name
type -- 'included' for variables required to be in the markov blanket, 'excluded' for variables required to be out of it.