import os
from src.utils import underscore_to_pascalcase
from typing import List, Dict, Optional
from src.memorykit.note import ExperienceSource


template_statement="""You write custom {} kernels to replace the pytorch operators in the given architecture to get speedups. \n
    You have complete freedom to choose the set of operators you want to replace. You may make the decision to replace some operators with custom {} kernels and leave others unchanged. You may replace multiple operators with custom implementations, consider operator fusion opportunities (combining multiple operators into a single kernel, for example, combining matmul+relu), or algorithmic changes (such as online softmax). You are only limited by your imagination.\n
"""
template_instruction="""
Optimize the architecture named Model with custom {} operators! Name your optimized output architecture ModelNew. Output the new code in codeblocks. Please generate real code, NOT pseudocode, make sure the code compiles and is fully functional. Just output the new model code, no other text, and NO testing code! \n
"""

template_example_intro='''
Here's an example to show you the syntax of inline embedding custom {} operators in torch: The example given architecture is:
'''

template_new_arch_intro='''
The example new arch with custom {} kernels looks like this:
'''

ASCENDC_PROBLEM_STATEMENT = 'You are an expert in writing custom AscendC kernels to optimize PyTorch architectures by replacing specific operators for performance gains.\n'
ASCENDC_PROBLEM_INSTRUCTION='''
Your task: Replace relevant PyTorch operators in the architecture named Model with custom AscendC kernels.
Your response should be a brief outline/natural-language description of how you implemented the it in this round followed by a single markdown code block (**wrapped in ```python ```**) which include six embedded Python strings that address the `Problems` with the methods in `Solutions`: `project_json_src`, `host_tiling_src`, `host_operator_src`, `kernel_src`, `python_bind_src`, and `model_src`. 
There should be no additional headings or text in your response. Just natural language text followed by a newline and then the markdown code block.
- The kernel function name in `kernel_src` matches the provided kernel name exactly.
- The operator definitions in `project_json_src` and `host_operator_src` correspond to the kernel name, following PascalCase naming conventions.
- The `model_src` defines the new model(i.e., `ModelNew`) architecture with the custom kernel implemented in `python_bind_src`.
- If you are given success/failure experiences or failed attempts, extract transferable patterns from successes and proactively avoid repeating the same failure modes.
- The generated code is complete, functional, and ready for compilation. No other text, and NO testing code!
'''

debug_compile_template = """
You are an expert in writing custom AscendC kernels to optimize PyTorch architectures.Your previous solution had some compilation errors.
So based on the information below, you should revise it in order to fix these bugs.

In your previous attempt, you are trying to implement a custom AscendC kernel for the following architecture. And the kernel name is `{op}` (PascalCase: `{pascal_op}`):

```python
{arc_src}
```

Additionally, here is the previous (buggy) implementation to implement the custom AscendC kernel:
```python
{last_attempt}
```

The execution output for the previous implementation is as follows:
```
{compile_error_msg}
```
"""
debug_mem_template="""
The list below shows previous problems and how they were addressed. 
It is worth noting that if the same problem below occurs again, treat the recorded solution as incorrect and use a different approach:
{memory}
"""
debug_api_template="""
Below are some apis that you might use:
{apis}
"""
debug_response_template="""
Your response should be a brief outline of what problem(s) were addressed in this debug round and how they were resolved in following format:
`Problems: <a natural-language description of the problematic code that should be addressed>. Solutions: <a natural-language description of the fixes that were applied>`
followed by a single markdown code block (**wrapped in ```python ```**) which include six embedded Python strings that address the `Problems` with the methods in `Solutions`: `project_json_src`, `host_tiling_src`, `host_operator_src`, `kernel_src`, `python_bind_src`, and `model_src`. 
There should be no additional headings or text in your response. Just natural language text followed by a newline and then the markdown code block.
- The kernel function name in `kernel_src` matches the provided kernel name exactly.
- The operator definitions in `project_json_src` and `host_operator_src` correspond to the kernel name, following PascalCase naming conventions.
- The generated code is complete, functional, and ready for compilation. No other text, and NO testing code!
"""

debug_correct_template = """
You are an expert in writing custom AscendC kernels to optimize PyTorch architectures.Your previous solution had some functional correctness issues.
So based on the information below, you should revise it in order to fix these issues.

In your previous attempt, you are trying to implement a custom AscendC kernel for the following architecture. And the kernel name is `{op}` (PascalCase: `{pascal_op}`):
```python
{arc_src}
```

Additionally, here is the code from the last attempt:
```
{last_attempt}
```
The `python_bind_src` and `model_src` define how the kernel is implemented and called, respectively. The inputs are generated by the previously defined `get_inputs()` function.

The last attempt had no syntax errors but it encountered issues when testing its functional correctness, which are listed below:
```
{correct_error_msg}
```

The list below shows previous problems and how they were addressed. 
It is worth noting that if the same problem below occurs again, treat the recorded solution as incorrect and use a different approach:
{memory}

Your response should be a brief outline of what problem(s) were addressed in this debug round and how you resolve them in following format:
`Problems: <a natural-language description of the problematic code that should be addressed>. Solutions: <a natural-language description of the fixes that were applied>`
followed by a single markdown code block (**wrapped in ```python ```**) which include six embedded Python strings that address the `Problems` with the methods in `Solutions`: `project_json_src`, `host_tiling_src`, `host_operator_src`, `kernel_src`, `python_bind_src`, and `model_src`. 
There should be no additional headings or text in your response. Just natural language text followed by a newline and then the markdown code block.
- The kernel function name in `kernel_src` matches the provided kernel name exactly.
- The operator definitions in `project_json_src` and `host_operator_src` correspond to the kernel name, following PascalCase naming conventions.
- The generated code is complete, functional, and ready for compilation. No other text, and NO testing code!
"""

optimize_template = """
You are an expert in writing custom AscendC kernels to optimize PyTorch architectures.
You are provided with a previously developed solution.
Your task is to incrementally improve the existing implementation to further increase performance (e.g., lower latency, fewer memory copies, better resource utilization).

In your previous attempt, you are trying to implement a custom AscendC kernel for the following architecture. And the kernel name is `{op}` (PascalCase: `{pascal_op}`):
```python
{arc_src}
```

Below is the code from the last attempt, which represents the CURRENT baseline:
```
{last_attempt}
```
The `python_bind_src` and `model_src` define how the kernel is implemented and called, respectively. The inputs are generated by the previously defined `get_inputs()` function.

"""

optimize_response_format = """
Your response should clearly describe the optimization techniques you applied in this round which might cover: the optimization category/type (e.g., Data Movement Optimization, Memory Optimization, Pipeline Optimization), the specific optimization technique or approach used, which part of the code was modified (e.g., kernel implementation, tiling strategy, memory layout), and the expected performance improvement mechanism (e.g., reduced memory copies, better cache utilization, improved parallelism). Write this as flowing natural language sentences, not as a bulleted list. Then followed by a single markdown code block (**wrapped in ```python ```**) which include six embedded Python strings: `project_json_src`, `host_tiling_src`, `host_operator_src`, `kernel_src`, `python_bind_src`, and `model_src`. 
There should be no additional headings or text in your response. Just natural language text followed by a newline and then the markdown code block.
- The kernel function name in `kernel_src` matches the provided kernel name exactly.
- The operator definitions in `project_json_src` and `host_operator_src` correspond to the kernel name, following PascalCase naming conventions.
- The `model_src` defines the new model(i.e., `ModelNew`) architecture with the custom kernel implemented in `python_bind_src`.
- The generated code is complete, functional, and ready for compilation. No other text, and NO testing code!
"""

def generate_template(arc_src, example_arc_src, example_new_arc_src, language, retrieved_contents=[]):
    prompt = template_statement.format(language, language)

    if example_arc_src != "" and example_new_arc_src != "":
        prompt += f"""
        {template_example_intro.format(language)} \n
        ``` \n
        {example_arc_src}
        ``` \n
        {template_new_arch_intro.format(language)} 
        ```
        {example_new_arc_src}
        ``` \n
        """
    ## TODO: Will be modified to be more general later
    if retrieved_contents:
        prompt += "Here are some relevant retrieved experiences and information that may help you:\n"
        for idx, content in enumerate(retrieved_contents):
            prompt += f"Experience {idx+1}:\n```\n{content}\n```\n"
    ## End TODO
    prompt += f"""
    You are given the following architecture: \n
    ```
    {arc_src}
    ```
    """
    prompt += template_instruction.format(language)
    return prompt    

def ascendc_template(
    arc_src: str,
    example_arc_src: str,
    example_new_arc_src: str,
    op: str,
    example_op: str,
    example_performance: Optional[Dict] = None,
    error_attempts: Optional[List[Dict]] = None,
    experience_contents: Optional[List[Dict]] = None,
    apis_contents: Optional[List[str]] = None,
) -> str:
    if error_attempts is None:
        error_attempts = []
    if experience_contents is None:
        experience_contents = []
    if apis_contents is None:
        apis_contents = []
    op = op + '_custom'
    example_op = example_op + '_custom'
    prompt = ASCENDC_PROBLEM_STATEMENT

    def _format_performance(perf: Optional[Dict]) -> str:
        if isinstance(perf, dict) and perf:
            return (
                f"- mean={perf.get('mean')}, std={perf.get('std')}, "
                f"min={perf.get('min')}, max={perf.get('max')}, "
                f"num_trials={perf.get('num_trials', 5)}\n"
            )
        return "- No performance metrics available for this run.\n"

    if example_arc_src != "" and example_new_arc_src != "":
        if op == example_op:
            prompt += f"""
You have successfully implemented this operator before (same kernel name: `{op}`). Below is a **previously verified correct reference implementation** and its observed performance.

Non-negotiable constraints:
- Your final answer must still include six embedded Python strings: `project_json_src`, `host_tiling_src`, `host_operator_src`, `kernel_src`, `python_bind_src`, `model_src`.
- The kernel function name inside `kernel_src` must **exactly** match `{op}` (case-sensitive).
- The operator names in `project_json_src` and `host_operator_src` must correspond to `{op}`, and use PascalCase: `{underscore_to_pascalcase(op)}`.

Historical input architecture (kernel name `{example_op}`):
```python
{example_arc_src}
```

Historical transformed version (correct and reusable):
```python
{example_new_arc_src}
```

Observed performance in that successful run:
""" + _format_performance(example_performance)

        else:
            prompt += f"""
Here is a **successful example** that illustrates the expected transformation using custom AscendC operators.
Use it as a template for structure, file layout, naming conventions, and integration patterns. Adapt the kernel logic to the new operator `{op}`.

Original architecture (kernel name `{example_op}`):
```python
{example_arc_src}
```

Transformed version using custom AscendC kernels (for reference):
- It includes six embedded Python strings: `project_json_src`, `host_tiling_src`, `host_operator_src`, `kernel_src`, `python_bind_src`, `model_src`.
- The kernel function name in `kernel_src` must exactly match the provided kernel name.
- The operator definition in `project_json_src` and `host_operator_src` should correspond to the kernel name, but follow PascalCase naming.

```python
{example_new_arc_src}
```

Observed performance for the successful example run (for context):
""" + _format_performance(example_performance) + """
"""
    if error_attempts:
        prompt += (
            "Below are previous failed attempts and their reflections.\n"
            "Treat them as **negative examples**: you do NOT need to fix every bug shown there, but you should actively avoid repeating the same mistakes described in the reflections.\n"
            "Do not copy the failed code verbatim; use it only to identify pitfalls and add guardrails/checks in your implementation.\n"
        )
        for idx, error_attempt in enumerate(error_attempts):
            err_code = error_attempt.get("code", "")
            exp_contents = error_attempt.get("exp_contents", "")
            if err_code:
                prompt += f"\nPrevious Failed Attempt {idx+1}:\n```python\n{err_code}\n```\n"
            if exp_contents:
                prompt += f"Reflection about above failed attempt {idx+1}:\n"
                for content in exp_contents:
                    prompt += f"- {content}\n"
            prompt += "\n"

    if experience_contents:
        failed_exp_contents = [exp['content'] for exp in experience_contents if exp['source'] in [ExperienceSource.BASIC_FAILURE.value, ExperienceSource.OPTIMIZE_FAILURE.value]]
        success_exp_contents = [exp['content'] for exp in experience_contents if exp['source'] in [ExperienceSource.DRAFT_SUCCESS.value, ExperienceSource.DEBUG_SUCCESS.value]]
        prompt += (
            "Here are some relevant retrieved experiences from previous attempts.\n"
            f"Your job is to **transfer** success patterns that apply to `{op}` "
            "and to **avoid** repeating the failure patterns.\n"
        )
        for idx, content in enumerate(success_exp_contents):
            prompt += f"Success Experience {idx+1}: {content}\n"
        prompt += "\n"
        for idx, content in enumerate(failed_exp_contents):
            prompt += f"Failed Experience {idx+1}: {content}\n"
        prompt += "\n"

    if apis_contents:
        prompt += "Here are some relevant retrieved APIs and information that may help you:\n"
        for idx, content in enumerate(apis_contents):
            prompt += f"{idx+1}: {content}\n"
    prompt += f"""
Now you are given the following architecture. Target kernel name: `{op}` (PascalCase: `{underscore_to_pascalcase(op)}`):
```python
{arc_src}
```
    """
    prompt += ASCENDC_PROBLEM_INSTRUCTION
    return prompt


def ascendc_debug_compile_template(arc_src: str, op: str, last_attempt: str, compile_error_msg: str, experience_contents: list, apis_contents: list) -> str:
    op = op + '_custom'

    if "@file_path:" not in last_attempt:
        idx1 = last_attempt.find('host_tiling_src')+len("host_tiling_src")+5
        
        if idx1 != -1:
            last_attempt = last_attempt[:idx1] + f"// @file_path: op_host/{op}_tiling.h\n" + last_attempt[idx1:]
        idx2 = last_attempt.find('host_operator_src')+len("host_operator_src")+5
        if idx2 != -1:
            last_attempt = last_attempt[:idx2] + f"// @file_path: op_host/{op}.cpp\n" + last_attempt[idx2:]
        idx3 = last_attempt.find('kernel_src')+len("kernel_src")+5
        if idx3 != -1:
            last_attempt = last_attempt[:idx3] + f"// @file_path: op_kernel/{op}.cpp\n" + last_attempt[idx3:]

    pascal_op = underscore_to_pascalcase(op)
    prompt = debug_compile_template.format(
        op=op,
        pascal_op=pascal_op,
        arc_src=arc_src,
        last_attempt=last_attempt,
        compile_error_msg=compile_error_msg
    )
    
    if experience_contents:
            prompt += "Here are some relevant retrieved experiences and information that may help you:\n"
            for idx, content in enumerate(experience_contents):
                prompt += f"{content}\n"
    if apis_contents:
        prompt += "Here are some relevant retrieved apis and information that may help you:\n"
        for idx, content in enumerate(apis_contents):
            prompt += f"{idx+1}: {content}\n"
    
    return prompt+debug_response_template

def ascendc_debug_correct_template(arc_src: str, op: str, last_attempt: str, correct_error_msg: str, mem: list) -> str:
    op = op + '_custom'
    if "@file_path:" not in last_attempt:
        idx1 = last_attempt.find('host_tiling_src')+len("host_tiling_src")+5
        if idx1 != -1:
            last_attempt = last_attempt[:idx1] + f"// @file_path: op_host/{op}_tiling.h\n" + last_attempt[idx1:]
        idx2 = last_attempt.find('host_operator_src')+len("host_operator_src")+5
        if idx2 != -1:
            last_attempt = last_attempt[:idx2] + f"// @file_path: op_host/{op}.cpp\n" + last_attempt[idx2:]
        idx3 = last_attempt.find('kernel_src')+len("kernel_src")+5
        if idx3 != -1:
            last_attempt = last_attempt[:idx3] + f"// @file_path: op_kernel/{op}.cpp\n" + last_attempt[idx3:]

    pascal_op = underscore_to_pascalcase(op)
    prompt = debug_correct_template.format(
        op=op,
        pascal_op=pascal_op,
        arc_src=arc_src,
        memory="\n".join(f"{i}. {item}" for i, item in enumerate(mem, 1)) if mem else "Nothing",
        last_attempt=last_attempt,
        correct_error_msg=correct_error_msg
    )

    return prompt

def ascend_optimize_template(
    arc_src: str, 
    op: str, 
    last_attempt: str, 
    failed_attempt: Dict, 
    succ_attempt_plans: List[Dict],
    optimized_attempt: Dict, 
    best_practice: str, 
    experience_contents: List[str]
) -> str:
    op = op + '_custom'
    pascal_op = underscore_to_pascalcase(op)
    prompt = optimize_template.format(
        op=op,
        pascal_op=pascal_op,
        arc_src=arc_src,
        last_attempt=last_attempt,
    )

    if failed_attempt:
        prompt += (
            "Based on the current baseline, the following optimization attempt was made but FAILED.\n"
            "This failure indicates that the corresponding optimization approach is NOT FEASIBLE or introduces unacceptable issues under the current constraints.\n"
        )
        failed_code = failed_attempt.get("code", "")
        failed_plan = failed_attempt.get("plan", "")
        failed_experiences = failed_attempt.get("failed_experiences", [])
        if failed_code:
            prompt += f"\nCode of this failed attempt:\n```python\n{failed_code}\n```\n"
        if failed_plan:
            prompt += f"\nPlan of this failed attempt:\n{failed_plan}\n"
        prompt += "\n"
        if failed_experiences:
            prompt += "Experiences of this failed attempt:\n"
            for content in failed_experiences:
                prompt += f"- {content}\n"
        prompt += "\n"


    if experience_contents:
        prompt += (
            "Here are some relevant retrieved failed experiences from previous attempts.\n"
            f"Your job is to **avoid** repeating the failure patterns.\n"
        )
        for idx, content in enumerate(experience_contents):
            prompt += f"Failed Experience {idx+1}: {content}\n"

    if optimized_attempt:
        example_op = optimized_attempt.get("op_name", "")
        example_arc_src = optimized_attempt.get("arc_src", "")
        example_optimized_degree_to_root = optimized_attempt.get("optimized_degree_to_root")
        example_optimized_degree_to_parent = optimized_attempt.get("optimized_degree_to_parent")
        example_optimized_degree_to_best = optimized_attempt.get("optimized_degree_to_best")
        example_plan = optimized_attempt.get("plan", "")
        example_code = optimized_attempt.get("code", "")
        prompt += f"""The following is a successful optimization example from a DIFFERENT task. It is provided as a reference for optimization strategies and implementation patterns only. Adapt the underlying ideas cautiously to the current task if applicable."""
        prompt += f"""This successful optimization attempt for the task {example_op} resulted in a performance improvement of {example_optimized_degree_to_root} which means it run faster than the previous attempt for about {example_optimized_degree_to_root}. 
The pytorch reference architecture is:
```python
{example_arc_src}
```
The optimization plan is:
```python
{example_plan}
```
The implementation code is:
```python
{example_code}
```
"""
    if best_practice:
        prompt += f"""The example below is a best practice for performance optimization, for your reference.
```
{best_practice}
```
"""
    if succ_attempt_plans:
        prompt += """\nThe following optimization plans have been successfully attempted for the current task. Note that some plans achieved significant performance improvements while others had smaller gains. 
Performance metrics:
- Step improvement: The performance improvement achieved by this specific optimization step (relative to the previous version). A value of 1.0 means no improvement; values > 1.0 indicate speedup (e.g., 1.5 means 1.5x faster).
- Cumulative improvement: The cumulative performance improvement from the baseline to this point. A value of 1.0 means no improvement; values > 1.0 indicate total speedup achieved so far.

**Focus on learning from plans with large improvements** (high optimized_degree values) - these indicate effective optimization strategies. You can either:
1. Combine or integrate the most effective optimization techniques from high-performing plans
2. Propose a new optimization approach inspired by the best practice and the combined successful optimization plans
3. Do not simply copy the previous successful optimization plans, but propose a new optimization approach

Historical successful optimization plans and their performance improvements:
"""
        for idx, attempt in enumerate(succ_attempt_plans, 1):
            plan = attempt.get('plan', 'N/A')
            optimized_degree_to_parent = attempt.get('optimized_degree_to_parent', 'N/A')
            optimized_degree_to_root = attempt.get('optimized_degree_to_root', 'N/A')
            prompt += f"{idx}. Plan: {plan}\n   - Step improvement: {optimized_degree_to_parent}x\n   - Cumulative improvement: {optimized_degree_to_root}x\n\n"
    
    prompt += optimize_response_format
    return prompt


optimize_guidelines = '''
You are an AI performance optimization expert acting as an incremental optimization decision-maker.

Your task is NOT to summarize or explain optimization techniques.
Your task is to SELECT exactly ONE concrete and implementable optimization action that is most suitable for the CURRENT state of the code.

The selected optimization MUST:
- Be directly applicable to the given code and kernel architecture
- Be feasible to implement in the next iteration without additional assumptions
- Not repeat or overlap with completed optimizations
- Not rely on failed or invalidated optimization paths

### Performance Optimization Guidelines (Decision Space)
Choose from the following categories ONLY if they are applicable to the current code.

1. **Data Movement Optimization**: Maximize data transfer efficiency by controlling data block size and GM address configuration. 
    Optimization points: 
    - Transfer larger data blocks
    - Align GM addresses to 512B boundaries
    - Use data movement APIs efficiently
    - Avoid repeated access to same addresses
    - Configure L2 CacheMode settings
2. **Memory Optimization**: Reduce memory usage and improve computational efficiency through buffer sharing/reuse, data compression, dedicated storage allocation, and memory access scheduling.
    Optimization points: 
    - Share temporary buffers between operators
    - Limit TilingData structure size
    - Optimize stack space by reducing Tensor ShapeInfo dimensions
    - Enable Unified Buffer fusion
    - Use BT Buffer for bias computation
    - Store quantization parameters in FP Buffer
    - Use L0C Buffer for temporary storage
    - Keep smaller matrices in L1 Buffer
    - Avoid bank conflicts in Unified Buffer
3. **Header Overhead Optimization**: Reduce operator header overhead (latency before computation execution) by using appropriate core counts and kernel types.
    Optimization points: 
    - Configure appropriate core counts and kernel types
4. **API Usage Optimization**: Provide API usage techniques to help select appropriate APIs and reduce redundant operations.
    Optimization points: 
    - Reuse VECIN and VECOUT pure data movement operators 
    - Avoid creating and initializing TPipe within objects
    - Enable AtomicAdd for Matmul operations
    - Flexibly use Counter mode in Vector operators
    - Select low-latency instructions to optimize reduction operation performance
5. **Pipeline Optimization**: Improve hardware resource utilization and achieve higher throughput through task parallelization and asynchronous scheduling. 
    Optimization points: 
    - Enable double buffering
    - Use Iterate or IterateAll asynchronous interfaces to avoid AIC/AIV synchronization dependencies
6. **Tiling Optimization**: Provide tiling-related optimization suggestions to help select appropriate tiling segmentation strategies.
    Optimization points: 
    - Implement L2 Cache segmentation
    - Ensure load balancing between cores

You are replacing relevant PyTorch operators in the following architecture `Model`
with custom AscendC kernels.

Kernel name:
- snake_case: `{op}`
- PascalCase: `{pascal_op}`

Architecture definition:
```python
{arc_src}
```

Code from the last attempt:
```
{code}
```

The `python_bind_src` and `model_src` define how the kernel is implemented and invoked.
Inputs are generated by the `get_inputs()` function.
'''

optimize_guidelines_output = '''
### Instructions:
Analyze the current task and code above, taking into account the applied optimization history and failed attempts.
Autonomously select ONE NEW optimization category and point from the guidelines that is most applicable and feasible for this iteration.

Rules:
- The category and action MUST exactly match the wording in the guidelines
- Do NOT invent new categories or actions
- Do NOT provide explanations or additional text

### Output Format(Json Only):
{{
    "category": "Select one optimization category from the guidelines above, e.g., Data Movement Optimization/Memory Optimization/..",
    "action": "Select one specific optimization point from the chosen category, e.g., Transfer larger data blocks/Share temporary buffers between operators/..",
}}
'''


def generate_optimize_guidelines_template(arc_src: str, op: str, code: str, optimize_history: List[Dict]=[], failed_attempts: List[Dict]=[],succ_attempts: List[Dict]=[]) -> str:
    op = op + '_custom'
    prompt = optimize_guidelines.format(op = op, pascal_op=underscore_to_pascalcase(op) ,arc_src = arc_src, code = code)
    if optimize_history:
        prompt += """\nTo reach the current version of this code, the following optimization plans have already been applied.

These optimization histories are considered COMPLETED.
Do NOT repeat the same optimization histories or select actions that are equivalent in effect.

Applied optimization histories (from most recent to oldest):
Performance metrics:
- Step improvement: The performance improvement achieved by this specific optimization step (relative to the previous version). A value of 1.0 means no improvement; values > 1.0 indicate speedup (e.g., 1.5 means 1.5x faster).
- Cumulative improvement: The cumulative performance improvement from the baseline to this point. A value of 1.0 means no improvement; values > 1.0 indicate total speedup achieved so far.

"""
        for idx, history in enumerate(optimize_history, 1):
            plan = history.get('plan', 'N/A')
            optimized_degree_to_parent = history.get('optimized_degree_to_parent', 'N/A')
            optimized_degree_to_root = history.get('optimized_degree_to_root', 'N/A')
            prompt += f"{idx}. Plan: {plan}\n   - Step improvement: {optimized_degree_to_parent}x\n   - Cumulative improvement: {optimized_degree_to_root}x\n\n"
        prompt += "\n"
    if succ_attempts:
        prompt += """\nBased on the current version, the following optimization plan was attempted and improved the performance more or less.
If in some optimization categories, the performance has been improved significantly, do NOT select the same category again.
"""
        for idx, attempt in enumerate(succ_attempts, 1):
            plan = attempt.get('plan', 'N/A')
            optimized_degree_to_parent = attempt.get('optimized_degree_to_parent', 'N/A')
            optimized_degree_to_root = attempt.get('optimized_degree_to_root', 'N/A')
            prompt += f"{idx}. Plan: {plan}\n   - Step improvement: {optimized_degree_to_parent}x\n   - Cumulative improvement: {optimized_degree_to_root}x\n\n"
        prompt += "\n"
    if failed_attempts:
        prompt += (
            """\nBased on the current version, the following optimization plan was attempted but FAILED.
The failure indicates that this optimization path is NOT FEASIBLE in the current context.
Do NOT select this plan again, and do NOT select optimization actions that are conceptually similar or rely on the same assumptions.
"""
        )
        failed_attempt = failed_attempts[0]
        failed_plan = failed_attempt.get("plan", "")

        failed_experiences = failed_attempt.get("failed_experiences", [])

        if failed_plan:
            prompt += f"""Failed optimization plan:
{failed_plan}

"""

        if failed_experiences:
            joined_experiences = "\n".join(failed_experiences)
            prompt += f"""Failed attempt experiences:
{joined_experiences}
"""
    prompt+= """You are now at the NEXT optimization iteration.
Your task is to select the most promising NEW optimization action under the current constraints.
"""

    prompt += optimize_guidelines_output
    return prompt