# M74 - Protocol for Structured Transformation of Verbatim Logs

**Version:** 1.0
**Status:** Draft

## 1.0 Objective

This document defines the Standard Operating Procedure (SOP) for transforming a raw, unstructured verbatim log file from the Gemini CLI into a structured, analyzable JSON format. The final JSON structure must accurately represent the complete conversational flow, including user inputs, AI outputs, tool calls, and system-level messages.

## 2.0 Target JSON Structure (v3)

The target output is a JSON object with the following structure. See previous conversation for a detailed example.

```json
{
  "log_filename": "string",
  "session_id": "string",
  "prologue": "string | null",
  "epilogue": "string | null",
  "conversation": [
    // Turn objects
  ]
}
```

## 3.0 Parsing Algorithm (v5)

The transformation MUST be performed by following a two-phase process: **Phase 1: Turn Segmentation** and **Phase 2: Turn Parsing**.

### 3.1 Phase 1: Turn Segmentation

The goal of this phase is to split the raw log file content into an ordered list of "Turn Blocks".

1.  **Define Turn Headers:** A "Turn Header" is a line that unambiguously marks the beginning of a new turn. The following four patterns are the only valid Turn Headers:
    *   **User Turn:** A line starting with `╭─`.
    *   **AI Turn:** A line starting with `✦ `.
    *   **System Error Turn:** A line starting with `✕ `.
    *   **System Info Turn:** A line starting with `ℹ `.

2.  **Scan for Headers:** Read the entire log file content. Scan through the content to find the starting character index of every occurrence of a Turn Header. This will produce a list of indices.

3.  **Slice into Turn Blocks:** Using the list of indices, slice the log content into string blocks. The content of a turn block is the substring from its header's starting index to the character just before the next header's starting index. The last turn block's content extends to the end of the file.

### 3.2 Phase 2: Turn Parsing

The goal of this phase is to parse each "Turn Block" string into a structured JSON object.

1.  **Identify Prologue/Epilogue:**
    *   Any content before the first Turn Block is the `prologue`.
    *   Search the final Turn Block for the session summary box (`╭───... Agent powering down ...───╯`). If found, this content is the `epilogue`, and it should be removed from the final turn's content.

2.  **Iterate through Turn Blocks:** For each block:
    *   **Determine Turn Type:** Check the first non-whitespace characters of the block to classify it as `user`, `ai`, `system_error`, or `system_info`.

    *   **Parse User Turn:**
        *   The content is the text inside the `│  > ...  │` part of the box.

    *   **Parse System Error/Info Turn:**
        *   The content is the entire line, stripped of the leading symbol.

    *   **Parse AI Turn:**
        *   This is a multi-step process.
        *   **a. Create Content Parts:** Use a regular expression to split the entire AI Turn Block by the delimiters `(\[tool_code\]|\[tool_output\])`. This will result in a list of strings.
        *   **b. Classify Parts:** Iterate through the list of strings:
            *   If a part begins with `[tool_code]`, create a `tool_call` object.
            *   If a part begins with `[tool_output]`, create a `tool_output` object. Analyze its content to determine the `status` (`success`, `failure`, `cancelled`).
            *   If a part does not begin with a delimiter, it is a `text` object.
        *   **c. Assemble:** The final AI turn object will contain a `content_parts` array holding these classified objects.

3.  **Assemble Final JSON:** Combine the `prologue`, `epilogue`, and the list of parsed turn objects into the final JSON structure.

