# DANCE-ST Knowledge Graph Structure

The knowledge graph in DANCE-ST serves as the foundation for the relevance-driven subgraph extraction phase (Phase 1). This document explains the structure and importance of the knowledge graph based on the [ANONYMIZED] LP dataset.

## Overview

The knowledge graph represents relationships between various entities in the jet engine degradation domain, including:
- Engine blades
- Material properties
- Environmental conditions
- Inspection results
- Degradation mechanisms

## Actual Structure in [ANONYMIZED] LP Dataset

The knowledge graph in the [ANONYMIZED] LP dataset is stored in the following files:
- `[ANONYMIZED]_kg.graphml` - Complete graph in GraphML format (3.0MB)
- `[ANONYMIZED]_lp_vertices.json` - All vertices with their properties (1.2MB)
- `[ANONYMIZED]_lp_edges.json` - All edges with their types and properties (1.8MB)
- `[ANONYMIZED]_lp_kg_config.json` - Configuration for spatial and temporal domains

### Spatial and Temporal Domain Configuration

```json
{
  "name": "[ANONYMIZED]LP",
  "vertices_file": "[ANONYMIZED]_lp_vertices.json",
  "edges_file": "[ANONYMIZED]_lp_edges.json",
  "spatial_domain": {
    "type": "2D_grid",
    "file": "[ANONYMIZED]_lp_spatial_grid.csv",
    "x_col": "x_coord",
    "y_col": "y_coord"
  },
  "temporal_domain": {
    "type": "discrete",
    "min": 0,
    "max": 9,
    "step": 1
  }
}
```

## Node Types

The knowledge graph consists of the following primary node types:

| Node Type    | Description               | Property             | Examples/Values                                    |
|--------------|---------------------------|----------------------|---------------------------------------------------|
| `blade`      | Turbine blade entities    | `alloy_type`         | "Rene-77", "GTD-111", "Inconel-718", "Waspaloy"   |
|              |                           | `heat_treatment`     | "Standard", "Modified", "Experimental"            |
|              |                           | `surface_coating`    | "Type-A", "Type-B", "Type-C", "None"              |
|              |                           | `manufacturing_batch`| 1-20                                              |
|              |                           | `initial_thickness_mm`| Typical range: 3.0-4.0mm                         |
|              |                           | `chromium_content_pct`| Typical range: 14.7-22.0%                        |
| `inspection` | Corrosion inspection results | `date`            | "2023-01-01" to "2023-09-28"                      |
|              |                           | `max_depth_mm`       | Measured depth of corrosion                       |
|              |                           | `significant_points` | Count of grid points with significant corrosion (0-2500) |
|              |                           | `passed`             | Boolean inspection result (true/false)            |

## Edge Types

The knowledge graph includes the following key relationship types:

| Edge Type           | Description                                    | Properties from [ANONYMIZED] LP                   |
|---------------------|------------------------------------------------|----------------------------------------------|
| `has_inspection`    | Links blades to their inspection results       | `time_point` (0-9 representing inspection dates) |
| `next_inspection`   | Links sequential inspections                   | `time_delta` (typically 1, representing one time step) |
| `similar_material`  | Links blades with similar material properties  | `similarity` (0.0-1.0, measures material similarity) |

## Material Properties

The [ANONYMIZED] LP dataset includes material properties in `material_properties.json`:

```json
{
  "Rene-77": {
    "thermal_expansion": 1.2e-05,
    "thermal_conductivity": 11.0,
    "youngs_modulus": 200.0,
    "poissons_ratio": 0.3,
    "avg_chromium_content": 18.35,
    "avg_thickness": 3.52,
    "thermal_conductivity_vs_temp": {
      "20": 11.0,
      "500": 18.0,
      "800": 24.0
    },
    "youngs_modulus_vs_temp": {
      "20": 210.0,
      "500": 175.0,
      "800": 155.0
    },
    "corrosion_model": "parabolic"
  }
}
```

## Environmental Conditions

Environmental conditions are stored in `environment_params.json`, including:

- Temperature profiles (750-840°C)
- Contaminant profiles (0.0-0.01)
- Oxygen partial pressure (0.21)
- Gas flow rate (15.0)
- Pressure (1.2)
- Humidity profiles (0.4-0.6)

## Corrosion Rate Data

Material-specific corrosion rates are stored in `corrosion_rates.json`:

```json
{
  "Rene-77": {
    "base_rate": 0.12,
    "uncertainty": 0.025,
    "activation_energy": 0.53
  },
  "GTD-111": {
    "base_rate": 0.1,
    "uncertainty": 0.022,
    "activation_energy": 0.46
  }
}
```

## Relevance Calculation

During Phase 1 of DANCE-ST, the knowledge graph is used to calculate the relevance score Λ(v,s,t) for each vertex, combining:

1. **Causal Relevance (α)**: Importance of the node to the degradation process
2. **Spatial Relevance (β)**: Proximity of the node to the spatial location
3. **Temporal Relevance (γ)**: Relevance of the node to the current time point

The formula used is:
```
Λ(v,s,t) = α·R_causal(v) + β·R_spatial(v,s) + γ·R_temporal(v,t)
```

## Dataset Statistics

The [ANONYMIZED] LP dataset contains:
- Approximately 500 blade entities
- 5,000 inspection records (500 blades × 10 time points)
- ~90,000 edges including:
  - `has_inspection` edges connecting blades to inspections
  - `next_inspection` edges connecting sequential inspections
  - `similar_material` edges connecting blades with similar material properties

## Remaining Useful Life (RUL) Data

RUL prediction is a critical output of the DANCE-ST system. The knowledge graph incorporates RUL data through:

1. **RUL CSV File**: The `[ANONYMIZED]_lp_rul.csv` file contains ground truth RUL values for each blade at different time points.

2. **Implicit RUL Representation**: While not directly stored as a vertex property, RUL is implicitly represented through:
   - The progression of `max_depth_mm` values in sequential inspections
   - The rate of change in `significant_points` count over time
   - The relationship between material properties and corrosion rates

3. **RUL Prediction Integration**: The subgraph extraction phase incorporates RUL-relevant vertices by:
   - Prioritizing inspection sequences that show accelerating degradation
   - Including material properties that influence degradation rates
   - Weighting temporal relevance based on proximity to failure points

4. **RUL Calibration**: The knowledge graph contains data structures that allow for calibration of RUL predictions against known failure criteria, such as:
   - Critical thickness thresholds for different alloy types
   - Historical RUL statistics for similar blades under similar conditions
   - Material-specific failure mechanisms

This RUL data is essential for Phase 2 (Neural and Symbolic Learning) to generate accurate predictions of remaining component life.

## Spatial Grid Representation

The knowledge graph incorporates spatial information through the `[ANONYMIZED]_lp_spatial_grid.csv` file, which defines a 2D grid for representing corrosion patterns:

- The grid contains 2,500 points (50×50) covering the blade surface
- Each point has x,y coordinates in a normalized space
- Corrosion depth measurements are taken at each grid point
- The `significant_points` property in inspection nodes counts grid points exceeding a threshold degradation level

Spatial relevance in the subgraph extraction uses this grid to:
1. Prioritize nodes related to regions showing accelerated degradation
2. Identify spatial patterns that correlate with specific failure modes
3. Weight relevance based on proximity to critical regions (e.g., blade edges)

## Constraints Mechanism

The [ANONYMIZED] LP dataset includes constraint definitions in the `constraints/` directory that the knowledge graph must satisfy:

1. **Physical Constraints**: Material-specific limits on degradation rates and patterns
2. **Temporal Constraints**: Monotonicity requirements for degradation progression
3. **Causal Constraints**: Relationships between environmental conditions and degradation mechanisms

These constraints are used in both subgraph extraction and in the Phase 3 projection stage to ensure that predictions remain physically plausible.

## Visualization Capabilities

The knowledge graph structure supports visualization through:

1. The `visualizations/` directory in the [ANONYMIZED] LP dataset containing templates for:
   - Heat maps of corrosion depth across the blade surface
   - Time series of degradation progression
   - Causal influence networks showing relationships between factors

2. The graph structure enables queries for:
   - Temporal comparisons between different blades
   - Spatial pattern visualization across the 2D grid
   - Similarity-based grouping of blades with related degradation patterns

## Connection to Neurosymbolic Models

The knowledge graph serves as the foundation for both neural and symbolic components:

1. **Neural Model Input**: 
   - The extracted subgraph provides feature vectors for neural training
   - Node and edge embeddings capture material properties, inspection history, and environmental data
   - Spatial and temporal features are encoded as matrices for the neural transformer

2. **Symbolic Model Constraints**:
   - Physical equations from the knowledge graph define the symbolic model's structure
   - Material-specific degradation mechanisms are parameterized based on graph properties
   - Causal relationships in the graph guide the structure of the symbolic model

3. **Integration Mechanism**:
   - The relevance scores from subgraph extraction guide attention mechanisms in the neural component
   - Knowledge graph constraints inform the calibration of both models
   - The graph structure enables explanation of predictions by tracing causal paths

This bidirectional connection between the knowledge graph and the models is essential for the neurosymbolic approach in DANCE-ST.

## Example Relevance-Driven Extraction

```python
# Extract top-k most relevant vertices for a specific (s,t) point
def extract_subgraph(graph, spatial_point, time_point, k=128):
    # Calculate relevance scores
    relevance_scores = {}
    for v in graph.nodes():
        causal_score = calculate_causal_relevance(v)
        spatial_score = calculate_spatial_relevance(v, spatial_point)
        temporal_score = calculate_temporal_relevance(v, time_point)
        
        # Combine with weights
        relevance_scores[v] = (
            0.5 * causal_score +    # α = 0.5
            0.3 * spatial_score +   # β = 0.3
            0.2 * temporal_score    # γ = 0.2
        )
    
    # Select top-k most relevant vertices
    top_vertices = sorted(relevance_scores.items(), 
                          key=lambda x: x[1], 
                          reverse=True)[:k]
    
    # Return induced subgraph
    return graph.subgraph([v for v, score in top_vertices])
```

## Integration with DANCE-ST

The knowledge graph is crucial for the three-phase DANCE-ST approach:

1. **Phase 1**: The graph is used to extract relevant subgraphs based on relevance scores
2. **Phase 2**: The extracted subgraph vertices are passed to neural and symbolic models
3. **Phase 3**: The projection phase ensures consistency with knowledge graph constraints

By focusing computation on only the most relevant parts of the knowledge graph, DANCE-ST achieves both computational efficiency and improved prediction accuracy.