# Profiling Data Fields Reference

This document describes the profiling data fields collected by `op_eval_framework` and which profiler pass contributes each field.

## Profiler Passes

The profiler runs **3 passes** with different AiCMetrics configurations:

| Pass | AiCMetrics Mode | Description |
|------|-----------------|-------------|
| **PipeUtilization** | `AiCMetrics.PipeUtilization` | Pipeline utilization and timing metrics |
| **Memory** | `AiCMetrics.Memory` | Memory bandwidth metrics (GB/s) |
| **ResourceConflictRatio** | `AiCMetrics.ResourceConflictRatio` | Resource contention and conflict metrics |

---

## Field Sources: Shared vs Pass-Exclusive

Each `kernel_details.csv` contains different metrics depending on the AiCMetrics mode. Some fields appear in **all passes** (shared), while others are **exclusive** to a specific pass.

### Shared Fields (All 3 Passes) - **AVERAGED**

These fields appear in every pass's `kernel_details.csv` and are **averaged** across all 3 passes:

| Field | Description |
|-------|-------------|
| `Duration(us)` → `Duration(ms)` | Total kernel duration |
| `Block Dim` | Block dimension configuration |
| `aicore_time(us)` → `aicore_time(ms)` | AI Core execution time |
| `aiv_time(us)` → `aiv_time(ms)` | AI Vector execution time |
| `aic_total_cycles` | AI Core total cycles |
| `aiv_total_cycles` | AI Vector total cycles |

> **Note**: Since each pass runs the kernel once, averaging these shared timing fields reduces measurement noise.

### PipeUtilization Pass - Exclusive Fields

These fields **only** appear in the PipeUtilization pass:

| Field | Description |
|-------|-------------|
| `aic_mac_time(ms)`, `aic_mac_ratio` | MAC unit timing and utilization |
| `aic_scalar_time(ms)`, `aic_scalar_ratio` | AI Core scalar unit |
| `aic_mte1_time(ms)`, `aic_mte1_ratio` | AI Core MTE1 transfer |
| `aic_mte2_time(ms)`, `aic_mte2_ratio` | AI Core MTE2 transfer |
| `aic_fixpipe_time(ms)`, `aic_fixpipe_ratio` | Fixed pipeline |
| `aiv_vec_time(ms)`, `aiv_vec_ratio` | Vector pipeline |
| `aiv_scalar_time(ms)`, `aiv_scalar_ratio` | AI Vector scalar |
| `aiv_mte2_time(ms)`, `aiv_mte2_ratio` | AI Vector MTE2 |
| `aiv_mte3_time(ms)`, `aiv_mte3_ratio` | AI Vector MTE3 |
| `cube_utilization(%)` | Cube unit utilization |
| `aic_icache_miss_rate`, `aiv_icache_miss_rate` | Instruction cache miss rates |

### Memory Pass - Exclusive Fields

These fields **only** appear in the Memory pass:

| Field | Description |
|-------|-------------|
| `aic_l1_read_bw(GB/s)`, `aic_l1_write_bw(GB/s)` | AI Core L1 bandwidth |
| `aic_l2_read_bw(GB/s)`, `aic_l2_write_bw(GB/s)` | AI Core L2 bandwidth |
| `aic_main_mem_read_bw(GB/s)`, `aic_main_mem_write_bw(GB/s)` | AI Core HBM bandwidth |
| `aiv_ub_read_bw(GB/s)`, `aiv_ub_write_bw(GB/s)` | AI Vector UB bandwidth |
| `aiv_l2_read_bw(GB/s)`, `aiv_l2_write_bw(GB/s)` | AI Vector L2 bandwidth |
| `aiv_main_mem_read_bw(GB/s)`, `aiv_main_mem_write_bw(GB/s)` | AI Vector HBM bandwidth |

### ResourceConflictRatio Pass - Exclusive Fields

These fields **only** appear in the ResourceConflictRatio pass:

| Field | Description |
|-------|-------------|
| `aiv_vec_bankgroup_cflt_ratio` | Vector bank group conflict |
| `aiv_vec_bank_cflt_ratio` | Vector bank conflict |
| `aiv_vec_resc_cflt_ratio` | Vector resource conflict |

---

## Output Structure

```json
{
  "performance": {
    "mean": 1.234,    // Execution time in ms (average)
    "std": 0.056,     // Standard deviation in ms
    "min": 1.100,     // Minimum time in ms
    "max": 1.400      // Maximum time in ms
  },
  "profiling": {
    "KernelNameCustom": {
      // ~40-45 fields from kernel_details.csv (merged across 3 passes)
      // Time fields are converted from µs to ms
    }
  }
}
```

---

## Profiling Fields (~42 total)

### Timing Fields (12 fields)

| Field | Unit | Description |
|-------|------|-------------|
| `Duration(ms)` | ms | Total kernel duration |
| `aicore_time(ms)` | ms | AI Core (matrix) execution time |
| `aiv_time(ms)` | ms | AI Vector core execution time |
| `aic_mac_time(ms)` | ms | Matrix multiply-accumulate time |
| `aic_scalar_time(ms)` | ms | AI Core scalar operations time |
| `aic_mte1_time(ms)` | ms | AI Core MTE1 (memory transfer) time |
| `aic_mte2_time(ms)` | ms | AI Core MTE2 time |
| `aic_fixpipe_time(ms)` | ms | AI Core fixed pipeline time |
| `aiv_vec_time(ms)` | ms | AI Vector pipeline time |
| `aiv_scalar_time(ms)` | ms | AI Vector scalar time |
| `aiv_mte2_time(ms)` | ms | AI Vector MTE2 time |
| `aiv_mte3_time(ms)` | ms | AI Vector MTE3 time |

### Cycle Counts (2 fields)

| Field | Unit | Description |
|-------|------|-------------|
| `aic_total_cycles` | cycles | AI Core total cycles |
| `aiv_total_cycles` | cycles | AI Vector total cycles |

### Pipeline Utilization Ratios (13 fields)

| Field | Unit | Description |
|-------|------|-------------|
| `aic_mac_ratio` | ratio | AI Core MAC utilization |
| `aic_scalar_ratio` | ratio | AI Core scalar utilization |
| `aic_mte1_ratio` | ratio | AI Core MTE1 utilization |
| `aic_mte2_ratio` | ratio | AI Core MTE2 utilization |
| `aic_fixpipe_ratio` | ratio | AI Core fixed pipeline utilization |
| `aiv_vec_ratio` | ratio | AI Vector pipeline utilization |
| `aiv_scalar_ratio` | ratio | AI Vector scalar utilization |
| `aiv_mte2_ratio` | ratio | AI Vector MTE2 utilization |
| `aiv_mte3_ratio` | ratio | AI Vector MTE3 utilization |
| `cube_utilization(%)` | % | Cube unit utilization |
| `aic_icache_miss_rate` | rate | AI Core instruction cache miss rate |
| `aiv_icache_miss_rate` | rate | AI Vector instruction cache miss rate |

### Bandwidth Metrics (12 fields) - From Memory Pass

| Field | Unit | Description |
|-------|------|-------------|
| `aic_l1_read_bw(GB/s)` | GB/s | AI Core L1 read bandwidth |
| `aic_l1_write_bw(GB/s)` | GB/s | AI Core L1 write bandwidth |
| `aic_l2_read_bw(GB/s)` | GB/s | AI Core L2 read bandwidth |
| `aic_l2_write_bw(GB/s)` | GB/s | AI Core L2 write bandwidth |
| `aic_main_mem_read_bw(GB/s)` | GB/s | AI Core HBM read bandwidth |
| `aic_main_mem_write_bw(GB/s)` | GB/s | AI Core HBM write bandwidth |
| `aiv_ub_read_bw(GB/s)` | GB/s | AI Vector UB read bandwidth |
| `aiv_ub_write_bw(GB/s)` | GB/s | AI Vector UB write bandwidth |
| `aiv_l2_read_bw(GB/s)` | GB/s | AI Vector L2 read bandwidth |
| `aiv_l2_write_bw(GB/s)` | GB/s | AI Vector L2 write bandwidth |
| `aiv_main_mem_read_bw(GB/s)` | GB/s | AI Vector HBM read bandwidth |
| `aiv_main_mem_write_bw(GB/s)` | GB/s | AI Vector HBM write bandwidth |

### Conflict Ratios (3 fields) - From ResourceConflictRatio Pass

| Field | Unit | Description |
|-------|------|-------------|
| `aiv_vec_bankgroup_cflt_ratio` | ratio | Vector bank group conflict ratio |
| `aiv_vec_bank_cflt_ratio` | ratio | Vector bank conflict ratio |
| `aiv_vec_resc_cflt_ratio` | ratio | Vector resource conflict ratio |

### Configuration (1 field)

| Field | Unit | Description |
|-------|------|-------------|
| `Block Dim` | count | Block dimension configuration |

---

## Interpreting Results

### Vector-Only Kernels (e.g., Mul, Add, Tanh)
- `aicore_time(ms)` = 0 (no matrix ops)
- `aiv_time(ms)` > 0 (vector operations)
- High `aiv_mte2_ratio` or `aiv_mte3_ratio` indicates memory-bound

### Matrix Kernels (e.g., MatMul, Conv)
- `aicore_time(ms)` > 0 (matrix ops active)
- `aic_mac_ratio` indicates compute utilization
- Low `cube_utilization(%)` indicates optimization opportunity

---

## Notes

1. **Time unit conversion**: All time fields with `(us)` in raw CSV are converted to `(ms)` in output
2. **Field merging**: If same field appears in multiple passes, values are averaged
3. **Zero values**: Many fields show 0.0 for kernels that don't use those units (e.g., AICore fields for vector-only ops)
4. **Custom kernel**: For `op_eval` mode, only kernel with name ending in `Custom` is extracted

---

## References

- [CANN 8.5 Operator Summary](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/850alpha002/devaids/Profiling/atlasprofiling_16_0067.html)
- [CANN 8.5 Profiling Overview](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/850alpha002/devaids/Profiling/atlasprofiling_16_0057.html)

