# Algorithm 2: Causal Structure Learning with Knowledge Constraints

The causal structure learning module discovers causal relationships between state variables, actions, and rewards while incorporating domain knowledge as constraints. This algorithm combines constraint-based causal discovery with knowledge-guided refinement.

## Variable Selection and Preprocessing

Before applying causal discovery algorithms, the system must select relevant variables and preprocess the data to ensure the assumptions of causal discovery methods are satisfied.

```
Algorithm 2.1: Causal Variable Selection and Preprocessing
Input: State trajectories D = {(s_t, a_t, r_t, s_{t+1})}, knowledge graph G
Output: Processed dataset D_causal, variable set V

1: // Extract relevant variables
2: V_state ← select_state_variables(D, G)
3: V_action ← extract_action_variables(D)
4: V_reward ← extract_reward_components(D)
5: V ← V_state ∪ V_action ∪ V_reward

6: // Preprocess data
7: D_processed ← ∅
8: for each trajectory τ in D do
9:    τ_processed ← discretize_continuous_variables(τ, V)
10:   τ_processed ← handle_missing_values(τ_processed)
11:   τ_processed ← normalize_variables(τ_processed)
12:   D_processed ← D_processed ∪ {τ_processed}
13: end for

14: // Apply temporal constraints
15: temporal_constraints ← extract_temporal_ordering(V)
16: D_causal ← apply_temporal_constraints(D_processed, temporal_constraints)

17: return D_causal, V
```

The variable selection process in lines 2-5 leverages domain knowledge to identify variables that are likely to participate in causal relationships. The discretization step handles continuous variables using adaptive binning methods that preserve causal relationships while enabling the use of discrete causal discovery algorithms.

## Knowledge-Constrained PC Algorithm

The core causal discovery process employs a modified PC algorithm that incorporates knowledge constraints to guide the search process and improve accuracy.

```
Algorithm 2.2: Knowledge-Constrained PC Algorithm
Input: Dataset D_causal, variable set V, knowledge graph G, significance level α
Output: Causal graph C = (V, E_causal)

1: // Initialize complete graph
2: C ← complete_graph(V)
3: knowledge_constraints ← extract_causal_constraints(G, V)

4: // Phase 1: Edge removal based on conditional independence
5: for l = 0 to |V| - 2 do
6:    for each edge (X, Y) in C do
7:       for each subset S ⊆ adjacent(X, C) \ {Y} with |S| = l do
8:          // Test conditional independence
9:          p_value ← conditional_independence_test(X, Y, S, D_causal)
10:         
11:         // Apply knowledge constraints
12:         knowledge_score ← evaluate_knowledge_consistency(X, Y, S, knowledge_constraints)
13:         adjusted_p_value ← p_value * (1 + λ_kc * knowledge_score)
14:         
15:         if adjusted_p_value > α then
16:            remove_edge(C, X, Y)
17:            record_separation_set(X, Y, S)
18:            break
19:         end if
20:      end for
21:   end for
22: end for

23: // Phase 2: Edge orientation
24: C ← orient_edges_with_knowledge(C, knowledge_constraints)
25: C ← apply_orientation_rules(C)

26: return C
```

The knowledge consistency evaluation in line 12 computes a score based on how well the potential causal relationship aligns with the domain knowledge encoded in the knowledge graph. This score modulates the significance threshold for conditional independence tests, making the algorithm more conservative about removing edges that are supported by domain knowledge.

## Causal Graph Refinement and Validation

After the initial causal discovery, the algorithm refines the learned structure using additional knowledge constraints and validates the results through cross-validation and expert review.

```
Algorithm 2.3: Causal Graph Refinement
Input: Initial causal graph C_init, knowledge graph G, dataset D_causal
Output: Refined causal graph C_refined

1: C_refined ← C_init
2: knowledge_edges ← extract_known_causal_edges(G)
3: discovered_edges ← get_edges(C_init)

4: // Add missing knowledge-supported edges
5: for each edge (X, Y) in knowledge_edges do
6:    if (X, Y) not in discovered_edges then
7:       confidence ← estimate_edge_confidence(X, Y, D_causal)
8:       if confidence > θ_confidence then
9:          add_edge(C_refined, X, Y)
10:         set_edge_weight(C_refined, X, Y, confidence)
11:      end if
12:   end if
13: end for

14: // Remove contradictory edges
15: for each edge (X, Y) in discovered_edges do
16:    if contradicts_knowledge(X, Y, G) then
17:       contradiction_score ← compute_contradiction_score(X, Y, G)
18:       if contradiction_score > θ_contradiction then
19:          remove_edge(C_refined, X, Y)
20:       end if
21:    end if
22: end for

23: // Validate graph properties
24: C_refined ← ensure_acyclicity(C_refined)
25: C_refined ← validate_temporal_ordering(C_refined)

26: return C_refined
```

The edge confidence estimation in line 7 uses bootstrap sampling and cross-validation to assess the reliability of potential causal relationships. The contradiction detection mechanism identifies cases where the data-driven discovery conflicts with established domain knowledge, allowing for careful resolution of these conflicts.

